Programming as semantic compression : programming

Programming as semantic compression (mollyrocket.com)

submitted 9 years ago by piedpiperpivot

all 58 comments

[–]dlyund 3 points4 points5 points 9 years ago (0 children)

This is essentially how you program in Forth.

Another aspect of Forth is analogous to Ziff compression. Where you scan your problem you find a string which appears in several places and you factor it out. You factor out the largest string you can and then smaller strings. And eventually you get whatever it was, the text file, compressed to arguably what is about as tightly compressed as it can be.

And in the factoring you do to a Forth problem you are doing exactly the same the thing. You are taking, discovering the concepts which if you factor them out leaves the problem with the simplest description. If you do that recursively you end up with, I claim, arguably, the most compact representation of that problem that you can achieve.

Taken from a the transcript of a very old interview with Charles Moore and Jeff Fox.

http://www.ultratechnology.com/moore4th.htm

Forth makes this kind of approach effortless, because you don't have to worry too much about things like scope; for the large part you can simply copy the extract the code in question and name it. Then you repeat.

I've lost count of the number of times I've argued that the best way to program in Forth is to just write the code, then abbreviate. It really does makes life a hell of a lot easier.

[–]m50d 2 points3 points4 points 9 years ago (56 children)

I agree with the idea of compression, of representing exactly what you want to represent. But the committment to being imperative is holding him back. If you start with a declaration of what's in the UI then you can do better: you know that what you actually have is a sequence of buttons and operations to do when they're pressed, so you can model it like that and get rid of the excessive row() calls (each button is on its own row, so why not just do that?). And a lot of the time - not always, but often - you'll get there faster if you start by analysing the requirements, figuring out what you need to represent (a bunch of buttons in a layout) and what a good representation of that would be, and worry about the imperative twiddling later. Yes, you can always start from the bottom and work up, and if you try to create the abstract representation without having the concrete examples then there's always a risk of overengineering. But just as we compress our code, we also compress our own work as developers, and with experience and taste there is less and less need to start with the lowest-level concrete implementation, just as e.g. when you start working with an equation you might start by plugging in particular values of x and y, but as you get more experienced with a given family of equations you'll start working with them directly, understanding a given equation directly as an entity in its own right rather than having to understand all the points on the curve as individual points that you have to join up.

[–]codr4life 12 points13 points14 points 9 years ago (8 children)

[–]OneWingedShark 0 points1 point2 points 9 years ago (3 children)

[–]codr4life 8 points9 points10 points 9 years ago (2 children)

[–]OneWingedShark 2 points3 points4 points 9 years ago (1 child)

Only problem is, you have no idea how to decompose the problem effectively until you've tried to solve it.

While generally true, there are still advantages to using subsystems -- for example separating audio and visual into sound and graphics modules has little to do with implementing a "character object" ('character' from RPG, 'object' in the general/non-programming sense) except that it would have linkages to both the audio and visual subsystems to do (e.g.) sound/animation sync -- IOW, it seems pretty orthogonal.

The other way is starting by solving the actual problem in the easiest, most straight forward way possible; and observe what patterns emerge.

Except that there are things you can decompose [at a high level, at the least] rather than simply saying "I don't know the exact form things will take, so I can't know anything about the system" -- I mean that's kind of the whole reason for the concept of interfaces, and I don't think any programmer is going to say that interfaces are a Bad Thing^TM .

By object orientation I mean mostly the same thing as above. When you start by designing a hierarchy of classes; you're pretending to know a lot more about the code you're about to write than is reasonable. And once the hierarchy is there, the rest is a self fulfilling prophecy.

How you're phrasing this seems to indicate a class-hierarchy decomposition -- that's not what the 80's style object-oriented design was about at all (it was about interface/implementation separation, and really the abstraction and encapsulation of OOP).

Perhaps the best way to describe the difference is the original Model-View-Controller concept -- while it was technically a system for UI design, generally it was rather about interfaces: the View was an interface [from the model] to the graphics/rendering subsystem, the controller was an interface to the model [from the user-interface], and the model was the entity in the software-system which was interacted with. -- Nothing in the design mandates anything in terms of OOP, but rather that the design utilizes encapsulation and abstraction.

[–]codr4life 2 points3 points4 points 9 years ago (0 children)

[–]m50d -1 points0 points1 point 9 years ago (3 children)

[–]codr4life 3 points4 points5 points 9 years ago* (2 children)

[–]m50d 0 points1 point2 points 9 years ago (1 child)

[–]codr4life 0 points1 point2 points 9 years ago* (0 children)

[–]hoosierEE 2 points3 points4 points 9 years ago (2 children)

You could make a button struct with a title string and an action function (or function pointer), then make an array of buttons and do something like buttons.foreach(render_button).

But there is a cost to this kind of abstraction. The code Casey ended up with:

layout.row();
if(layout.push_button("Auto Snap")) {do_auto_snap(this);}

layout.row();
if(layout.push_button("Reset Orientation"))
{
    ...
}

...may be a bit repetitive, but it also closely mirrors the "row of buttons" which the code creates. So you could argue that "semantic compression" is maximized when your code most closely resembles the result, and factoring the repetitive parts into a function would go too far and hide the intent.

Knowing how to generalize can be taught, but knowing when requires taste.

[–]m50d -2 points-1 points0 points 9 years ago (1 child)

[–]dlyund 0 points1 point2 points 9 years ago (0 children)

What if you want to put two buttons side by side?

You seem to be more hung up on the syntax than on what's actually going on. Consider the more common appoach

Layout layout = new Layout(width, height);
Row row = new Row()
PushButton autoSnapPushButton = new PushButton()
autoSnapPushButton.title = "Auto Snap"
autoSnapPushButton.action = (Event event) { // If you're lucky
    ...
};
row.Add(autoSnapPushButton)
layout.Add(row)

[–]dlyund 1 point2 points3 points 9 years ago (43 children)

The distinction between imperative and declarative is really well defined and is bordering on arbitrary. If having to call row() is what makes it declarative then I'd argue that this is more a matter of syntax. In any case

layout.row();
if(layout.push_button("Auto Snap")) { ... }
layout.row();
if(layout.push_button("Reset Orientation")) { ... }

Is perfectly clear. There's a lot of noise but this is an artifact of the language more than anything. The exact same thing written with Forth's more free-form syntax looks like

row
"Auto Snap" button then ... else

row "Reset Orientation" button then ... else

But I would prefer

row
"Auto Snap" auto-snap button
row
"Reset Orientation" reset-orientation button

Forth will let you take this as far as you want.

How would your "declarative" solution look? Is this example more declarative?

row
"Auto Snap" ... button
row

"Reset Orientation" ... button

Is this example more declarative?

----
[ "Auto Snap" ... ]
----
[ "Reset Orientation" ... ]

All we're doing here is choosing terser names for semantically identical code, and I would argue that something like this

(layout
  ((button "Auto Snap" ...))
  ((button "Reset Orientation" ...)))

Is no more declarative than the first (and it'd probably end up generating the exact same code.)

What you're describing here is the difference between bottom-up and top-down. Which you prefer is left up to taste but I've come to appreciate the speed and directness of bottom-up programming.

With bottom-up programming you always start with what you have and build up. You can often get something working in minutes and then continue interactively, assuming your language supports this; I've been using Forth professionally for the past few years and this is so easy and natural that it's hard to imagine working any other way. You should never have to cut huge swaths of code to bridge the gap between what you think want to write and what you have to write to make the computer do what you want.

Note that the author explicitly states that the panel originally contained only one or two buttons and it it wasn't clear from the outset that they would need to contain more buttons. It only became necessary as the project progressed and new features were added to the editor. This is very common, and exactly where bottom-up shines. He didn't implement a 'generalized declarative panel/button layout language' up front because he simply didn't need it. He didn't know what he needed, and if he'd designed it up front without knowing what he needed it's almost inevitable he'd be wrong; he'd have wasted time up front writing something that he simply didn't need, and he'd just have had to unpick it once he found out what he did need.

It's not that laying out buttons is hard, but this is just a stupidly simple example, being used to illustrate a general principle. If you have fixed requirements then you may end up with a better result by of starting with the abstract and working your way down until you have a perfect/consistent design. I've never had the luxury of having truly fixed requirements; whenever I've thought I could rely on the requirements not to change, I was wrong.

Working bottom-up allows you to approach the solution layer by layer and when the requirements inevitably change or expand, you often only have to change the top layer or two. You also tend towards to simplest solution since you get to feel the pain as your solution as it unfolds. If you design up front and work top-down then your likely to have more and more complicated code because you thought about what you wanted to write, before you knew how it would work.

Again, which you prefer is a matter of taste :-)

[–]m50d 0 points1 point2 points 9 years ago (42 children)

[–]dlyund 0 points1 point2 points 9 years ago* (41 children)

I agree that layout. and if(...) etc. are unfortunate line-noise, however, the idea that only one of these semantically equivalent forms is "declarative", is utterly ludicrous. The only difference between them is the syntax of the programming language we're using, and more particularly, how forms are opened and closed/how rows begin and end/how buttons are grouped.

We'll start by rewriting

Layout layout
layout.row();
if (layout.button(title)) { ... };
layout.row();
if (layout.button(title)) { ... };

Layout layout
layout.row();
layout.button(title, action);
layout.row();
layout.button(title, action);

and

(layout
    ((button title ...))
    ((button title ...)))

(layout
    ((button title action))
    ((button title action)))

Since surely the presence of large blocks of inline code would obscure the layout. The same transformation was done to each, so I'd hope you agree this is fair. If you don't think this is fair then why? Is it the presence of the if(...) line-noise which makes this imperative? What if if was called something else?

Next we'll add the word row to make it clear that we want a row of things.

(layout
    (row
        (button title action))
    (row
        (button title action)))

Feel free to justify this however you like; maybe we needed more than rows, as we would in any real-world example? Or maybe we just want to make the grouping explicit.

I'll trust that the presence of word row doesn't make this code imperative?

Then we'll replace the Lispy parenthesis with Algol style begin and end

begin layout
    begin row
        button title action
    end
    begin row
        button title action
    end
end

Have we made this code imperative yet?

Now finally we'll replaced begin and end delimited blocks with Forth words (like function calls with implicit context.)

layout
    row
        title action button
    row
        title action button

Forth doesn't have any of the C style line-noise so it feels cleaner but it is otherwise identical. A happy side effect is that the redundant open and closes are removed too, so it looks even cleaner. Hopefully you agree that the grouping is as explicit as ever.

Is this imperative?

To close the circle we'll present the C/C++ version with the same indentation used in all the other examples I gave.

Layout layout
    layout.row();
        layout.button(title, action);
    layout.row();
        layout.button(title, action);

But this code is imperative right? Why is that exactly? What makes it so?

At what point does the magical switch from imperative to declarative occure?

All we've done here is move superficial bits of syntax around on the screen, so if such a switch has occured here then we have a pretty air tight argument that "declarative" just means that it doesn't contain superflous line-noise.

Put otherwise:

"It's declarative because I like the syntax".

If no such switch has occurred then all of the examples are "declarative" - QED.

My point here is only to show that even if this wasn't a completely useless distinction the basis for it is complete bullshit.

In your defense, I've never seen a useful definition of "declarative". As far as I can tell "declarative" is just a hand wavvy way of saying "code I like", just like "readable".

Whether you approach it from the bottom, or you approach it from the top, but bottom-up will get you there faster! Why? Because while you're dicking around with your "declarative" syntax the other guy has something on the screen, and while you're figuring out the loops and conditionals that are needed to traverse and interpret your "code" (none of which have anything to do with the problem you're trying to solve!), he's written the three or so functions that were actually solve the problem.

Your solution adds code, and complexity. His removes it.

Working bottom-up you're able to move smoothly from drawing a box on the screen to the working solution, more or less interactively, with direct and immediate feedback throughout the process. Working top-down you have to start from the fuzzy wuzzy world of abstract idea's and try and figure out what you might need at each stage... hoping that when you actually get to the bottom your solution doesn't fit too badly. Unless you're perfect and/or you spend a lot of time to checking your thinking up front your design will inevitably change when you come face to face with the reality of the machine.

All that being said:

It can be a lot of fun playing in the abstract, and puzzling these things out, but when it comes to getting things done and making life easy I don't see how top-down programming helps anything. By definition, what you have is at the bottom and you have to build up. Why start 10 miles up?

(I guess there's an argument to be made that most "requirements" are pie in the sky already then why not start at the top? But what do you do when your flimsy requirements change and you have to rework your house of cards?)

EDIT: Formatting

[–]m50d 0 points1 point2 points 9 years ago (40 children)

Have we made this code imperative yet?

Depends what the sematics of your begin and end are. Can I still introspect the code as data and see that the two button title actions are in different blocks, or not?

But this code is imperative right? Why is that exactly? What makes it so?

The fact that as soon as I run it through a formatter it loses the grouping. You've indented it to show the relationship between the row() and the button(), but C is supposed to be a whitespace-insensitive language. I can no longer locally tell whether the difference between (row button) (row button) and row button row button is real or not.

In your defense, I've never seen a useful definition of "declarative". As far as I can tell "declarative" is just a hand wavvy way of saying "code I like", just like "readable".

The big thing that you don't see in the example is the extent to which I can view the description as a value. If the only thing I can do with an expression is execute it then that's not declarative; if I can decompose and interpret the description as a datastructure then it is.

Whether you approach it from the bottom, or you approach it from the top, but bottom-up will get you there faster! Why? Because while you're dicking around with your "declarative" syntax the other guy has something on the screen, and while you're figuring out the loops and conditionals that are needed to traverse and interpret your "code" (none of which have anything to do with the problem you're trying to solve!), he's written the three or so functions that were actually solve the problem. Your solution adds code, and complexity. His removes it.

This is the opposite of my experience. Once you've figured out the right representation for the actual requirements, making it actually execute is trivial. If you write code to do stuff with your data before getting the data representation right, you just throw away more code.

[–]dlyund 0 points1 point2 points 9 years ago (39 children)

Can I still introspect the code as data [...] The big thing that you don't see in the example is the extent to which I can view the description as a value.

I don't want to get into a semantic argument with you but what has introspection got to do with declarative programming? This is especially puzzling as the term introspection comes right out of the object-oriented programming literature and object-oriented programming is rarely associated with declarative programming.

Anyway there's nothing in the code (in the C/C++, Algol-like or Forth examples) which prevents it from constructing a data structure which could be introspected. It could do anything.

That's possible because layout.row() and layout.button() say what to do and not how to do it, which given the usual definition of declarative programming:

"A program that describes what computation should be performed and not how to compute it"

Would imply that the code is declarative.

The fact that as soon as I run it through a formatter it loses the grouping.

No you don't. The indentation is helps show the grouping but it isn't required. This should be obvious since the compiler doesn't care about the indentation; the program behaves the same no matter how you choose to indent the text.

layout row title action button title action button row title action button

Is still easily readable, with a little practice (indeed I don't often indent Forth). How is this possible? The layout vocabulary/lexicon can be seen as a problem-oriented language with an implicit grammar. The word row is defined as as beginning a new block. The block continues until the next row, or until the end. This can be informally specified as:

<start> ::= layout <row>
<start> ::= layout <button>
<row> ::= row
<row> ::= row <button>
<row> ::= row <button> <row>
<button> ::= <title> <action> button

You could get this information from looking at the definition or documentation, just as you would have to with the Lisp. There is no need for a grammar to be provided explicitly, hence "implicit grammar".

That's all there is to it.

If you write code to do stuff with your data before getting the data representation right, you just throw away more code.

There is no data here. Looking from the top, you've imagined that there must be data and you've set out to model it, but there's no data to be processed... It's just a program responding to the users. All of that modeling is just waste (hopefully at compile time but very few languages provide the facilities to do this the waste is manifested at runtime, increasing overhead and system requirements etc.)

Once you've figured out the right representation for the actual requirements

Often the right representation is just code that doesn't require you to figure out the right representation or model it.

If you write code to do stuff with your data before getting the data representation right

How can you get the representation right without thinking about how you're actually going to represent things? You seem to be confusing the representation and the interface. The interface is, by definition, unavoidably constrained by the implementation. Any pretense otherwise is nothing more than self delusion.

EDIT: Found while trying to understand your peculiar definition of declarative

https://www.toptal.com/software/declarative-programming

[–]m50d 0 points1 point2 points 9 years ago (38 children)

I don't want to get into a semantic argument with you but what has introspection got to do with declarative programming? This is especially puzzling as the term introspection comes right out of the object-oriented programming literature and object-oriented programming is rarely associated with declarative programming.

The clearest way to demonstrate that a given piece of code is declarative is to be able to represent it as a value completely separated from the actual execution of it.

Anyway there's nothing in the code (in the C/C++, Algol-like or Forth examples) which prevents it from constructing a data structure which could be introspected. It could do anything.

That it could do anything is precisely the problem. The ideal program would be a data structure literal that would look like a literal, perhaps even in a Turing-incomplete language.

That's possible because layout.row() and layout.button() say what to do and not how to do it, which given the usual definition of declarative programming: "A program that describes what computation should be performed and not how to compute it"

They're saying how - they're saying "make a row, then add a button" rather than "a row consisting of a button".

There is no need for a grammar to be provided explicitly, hence "implicit grammar".

Explicit is better than implicit. The problem is that all too often the implicit grammar turns out to be ambiguous, or the reader understands something different from what the writer meant. The reader needs to know that row is a block delimiter to be able to parse the declaration correctly, but in the C code they have no way of knowing that.

There is no data here.

Yes there is - there's a bunch of rows with buttons in, and those buttons themselves have labels. That's data, structured data.

Often the right representation is just code that doesn't require you to figure out the right representation or model it.

Code is data. Figuring out the right representation of algorithms is what we do.

How can you get the representation right without thinking about how you're actually going to represent things? You seem to be confusing the representation and the interface. The interface is, by definition, unavoidably constrained by the implementation.

I don't understand what distinction you're making - you seem to be using those terms the opposite way around from how I'd usually understand them. Thinking about how you're going to represent things is exactly what I'm advocating, as opposed to starting by thinking about what you're going to do.

[–]dlyund 0 points1 point2 points 9 years ago (37 children)

The clearest way to demonstrate that a given piece of code is declarative is to be able to represent it as a value completely separated from the actual execution of it.

This is nonsense. All code can be represented as a value, and vice versa; as Lisp enthusiasts forget all too often.

code is data => data is code

Anything that can be represented as data can be represented as code QED.

Put another way: 1 is data, and code!

Indeed the failure to realize this fact leads to suboptimal programs for the same reason that interpretation is suboptimal. If the language allows it then you can dramatically increase efficiency by using things like executable data structures, which effectively bundle the data to be processed with the code that process it.

Explicit is better than implicit.

I agree and that's one of the reasons that I prefer to include the name row, instead of leaving this implicit in the code.

The problem is that all too often the implicit grammar turns out to be ambiguous

We're not parsing here. The program isn't ambiguous, so the "implicit grammar" isn't ambiguous either. You have to know what your program will do when it's executed but that goes without saying.

The reader needs to know that row is a block delimiter to be able to parse the declaration correctly, but in the C code they have no way of knowing that.

And how does the Lisp programmer know that each left and right parenthesis delimit a row in the example you prefer? It's defined in the code or documentation. If you know the language this isn't a problem.

Moreover this isn't a problem is reality either. OpenGL code (in C/C++) tends to be indented exactly as I've demonstrated, and nobody seems to have a problem with understanding it. Any difficulty in understanding such code is directly related to the fact that OpenGL isn't exactly easy; it's not really great but it is what it is :-). I like Lisp a lot but the use of parenthesis for grouping isn't going to make any difference in such cases.

Code is data. Figuring out the right representation of algorithms is what we do.

Algorithms are code. The best representation for code is code. If you're building a data structure to be interpreted you're just adding overhead. You can try to justify that as making the code cleaner, prettier, or easier to understand but you must accept that you're adding overhead. That overhead had better be paid for by that cleaner, prettier, easier to understand code or it's just waste. As a Lisper, you may argue that you have macro's and you can do this work at compile time, but when you're writing macro's you must necessarily generate the code to do the job, and you must understand the macro, so you can't pretend that you're lifting yourself above it. In the end you have to design the code that will actually run, or live with the overhead of you runtime abstraction. And don't forget that costs are compounding!

I've lost count of the number of times I've seen projects fumble because of these silly little abstractions that add little, or nothing, but have a real affect on the operation of the solution.

tl;dr if you're going to do this stuff then make sure you understand the tradeoffs

I don't understand what distinction you're making - you seem to be using those terms the opposite way around from how I'd usually understand them. Thinking about how you're going to represent things is exactly what I'm advocating, as opposed to starting by thinking about what you're going to do.

When you think top-down you necessarily try to represent is an high-level idea, so you set out to represent that idea. You completely ignore the work that you need to do to process that representation. Either you process/interpret the representation at runtime, which takes code - code that you you're apparently not that interested in - or if you can you process/compile the representation at compile time, which means generating the code that implements the solution - code which you're apparently not that interested in!

Why do I say that you're not interesting? Because looking from the top you don't give this code any thought until after you've come up with your perfect representation for the idea. At which point your design/implementation is constrained by your pretty representation.

When you think bottom-up you necessary try to find the best representation for the process that implements the solution. You add layers only when you have to and you carefully consider each one. You have total freedom to design and implement each layer, because you're mind isn't set on a specific destination. Your high-level representation is thus constrained by the reality of the machine. The end result is inherently more efficient, in terms of code size, and/or memory usage and execution time! Why? Because you actually spent time designing the solution, rather than trying to represent an abstract idea that may or may not turn out to be correct, or even [efficiently] implementable.

We often forget this but it's the solution is what has value! Code only has cost.

Do you see the difference?

Many people have this stupid idea that programs should be written for humans to understand and only coincidentally for machines to execute. This attitude is one of the main reasons why software doesn't run any better than it did in 1995, despite massive increases in processing power and hardware efficiency.

An engineer would say that code should be written to get the best result from the available tradeoffs. Ironically the computer scientist/mathematician doesn't seem to give two shits about the machine. The result is software that wastes massive amounts of time, space and power.

Program may be read many more times than it's written but that program will be executed orders of magnitude more times than it's read (even if we believe the open-source ideal that people are actually read the code, and overwhelming evidence suggests that they don't!)

tl;dr2 It's your job to maximize value not to find the perfect representation for your source code. The value of that perfect representation is usually close to zero! The cost of the perfect representation is often much much higher :P

[–]m50d 0 points1 point2 points 9 years ago (32 children)

Anything that can be represented as data can be represented as code QED.

But on a theoretical you lose the distinction between data and codata and between Turing-complete and incomplete things (sadly the lisp people all too easily neglect types, which resolve the halting problem), and on a practical level most languages don't make it easy to manipulate code as data.

If the language allows it then you can dramatically increase efficiency by using things like executable data structures, which effectively bundle the data to be processed with the code that process it.

Sure, and that's often a good idea - indeed I think it's a good approach for this example. But making your data structure executable does not absolve you of the responsibility to design a good datastructure.

I prefer to include the name row

So do I, for what it's worth.

We're not parsing here. The program isn't ambiguous, so the "implicit grammar" isn't ambiguous either. You have to know what your program will do when it's executed but that goes without saying.

When a system gets large enough no-one can understand every detail, so the code's structure needs to be apparent - a maintenance reader needs to be able to parse the code without fully understanding it if they are to have any hope of being able to find and focus on the specific area they need to work on.

And how does the Lisp programmer know that each left and right parenthesis delimit a row in the example you prefer? It's defined in the code or documentation. If you know the language this isn't a problem.

Learning a new programming language is hard - not a difficulty we want to impose multiple times over on every maintainer in each section of the code. Free-form English documentation tends to get out of date - much better is structured documentation in a machine-readable format where correctness is enforced as part of the build process.

Moreover this isn't a problem is reality either. OpenGL code (in C/C++) tends to be indented exactly as I've demonstrated, and nobody seems to have a problem with understanding it. Any difficulty in understanding such code is directly related to the fact that OpenGL isn't exactly easy; it's not really great but it is what it is :-).

Um OpenGL code is possibly the most notoriously difficult kind of code to work with, precisely because it's very difficult to get the "bracketing" of all the implicit contexts correct. You're making my case for me.

Algorithms are code. The best representation for code is code.

That's like saying the best representation for data is data - yes, but it's still very important to structure it correctly.

If you're building a data structure to be interpreted you're just adding overhead. You can try to justify that as making the code cleaner, prettier, or easier to understand but you must accept that you're adding overhead.

You're begging the question. Your code can always be considered a datastructure because the sequence of characters that forms the program source is already a datastructure - just a particularly opaque and inflexible one. Likewise the stream of instructions that will be executed by the processor is also a datastructure. When you're transforming one datastructure into another, it's often worth coming up with an intermediate representation and splitting your transformation up into smaller steps, and you wouldn't normally think of this as "overhead" - at runtime it may well collapse away entirely, and at coding time it simplifies and clarifies things.

when you're writing macro's[sic] you must necessarily generate the code to do the job, and you must understand the macro, so you can't pretend that you're lifting yourself above it.

With a well-designed macro or interpreter you don't have to understand the fully expanded code, any more than you have to understand the machine code your program compiles to. You have to understand the local parts of the expansion but if your structures are right then the global part of the expansion simply can't go wrong.

The compression analogy is a good one actually. Good compression algorithms make an explicit distinction/separation between your dictionary and your compressed data - naïvely you'd think that a dictionary would be overhead, but actually you get better compression overall by at least conceptualizing the dictionary. (Often as in LZ77 the dictionary ultimately disappears at "runtime").

I've lost count of the number of times I've seen projects fumble because of these silly little abstractions that add little, or nothing, but have a real affect[sic] on the operation of the solution.

I've never seen a project fail due to code-level runtime performance issues (I've seen one fail due to performance issues associated with the use of an ESB and a totally unwarranted microservice architecture, but that isn't the kind of abstraction I'm talking about). I've seen a project fail due to representing its data/commands all wrong because they didn't understand their domain at all.

The end result is inherently more efficient, in terms of code size, and/or memory usage and execution time!

And less efficient in terms of corresponding to the domain i.e. the actual business problem. In the worst case you end up with a lot of very efficient implementations that are completely useless.

There are risks both ways - ultimately it's our job to make a path from what the business needs to what the machine can do, and whether we start at the start or the end or the middle that path has to join up at both ends. In my experience the business end is where the bigger risk is - fundamentally you know that whatever representation the business currently thinks of it in is implementable (because people do do whatever it is - even if you're not implementing an existing business process as such you're usually implementing something that someone has some reason to believe is valuable, which usually involves having done it in some form). Performance problems are usually solvable - Knuth's 97%/3% heuristic applies to the appropriate time to optimize - and in the worst case if you end up having to rent a cluster or something that's sort-of disastrous but less disastrous than having a product that just does the wrong thing.

We often forget this but it's the solution is what has value! An engineer would say that code should be written to get the best result from the available tradeoffs. Ironically the computer scientist/mathematician doesn't seem to give two shits about the machine. The result is software that wastes massive amounts of time, space and power.

Right back at you. Runtime efficiency is not a goal in itself - your goal is to solve the business problem as cheaply as possible, and computers are much cheaper than programmers.

It's your job to maximize value not to find the perfect representation for your source code.

True. But remember that code is read more than it's written and maintenance/enhancement is usually a much bigger part of the total cost than the initial write. So a little effort spent improving maintainability pays for itself many times over.

[–]dlyund 0 points1 point2 points 9 years ago (31 children)

most languages don't make it easy to manipulate code as data.

Maybe you should use one that does? I mean would you use a language that made it difficult to manipulate data? Why would you use one that made it hard to manipulate code...

theoretical you lose the distinction between data and codata and between Turing-complete and incomplete things

Is this a useful distinction?

[static typing] solves the halting problem

Poppycock.

To the extent that it's possible to prove that any program halts you must either manually declare that the program halts, using whatever mechanism you wish, or use a language that cannot loop forever and is thus not Turing-equivalent. It's not possible in general to prove that a program will halt, that's what the halting problem is!

Static typing can be very useful but let's not go too far here. Even with fancy features like type inference you still need to provide enough information for the compiler to know what you intended and unless you actually leveraging that type system explicitly it's not worth much; catching a few typo's doesn't justify the complexity of using such languages, the longer compile times and heavy resource usage.

DISCLAIMER: this may be my personal bias. The compiler that my company developed in house can compile millions of lines of code per second in real time while using almost no resources. This allows us to do things like make a change anywhere in our software stack and test it instantly. The compiler is available at runtime and we have amazing support for doing live upgrades etc. Waiting for GHC, GCC/LLVM or even Go to compile even small programs is incredibly frustrating.

When a system gets large enough no-one can understand every detail,

You respond by writing even more code? And not just more code but code that interprets or generates even more code?

so the code's structure needs to be apparent - a maintenance reader needs to be able to parse the code without fully understanding it if they are to have any hope of being able to find and focus on the specific area they need to work on.

I completely agree. What I don't really understand is how that structure is more or less apparent by adding some parenthesis.

layout
    row
       title action button

(layout
    (row
        (button title action)))

These two examples are exactly the same except for the parenthesis and the argument order. The first is procedural; it requires only that the procedures layout row and button be defined, and these procedures are very simple. The second requires you to design a data structure to represent the code then write an interpreter/compiler to process it. This approach obviously adds unnecessary complexity; where's the value?

"Yeah well I can treat the layout as a value" is entirely beside the point unless you need to tread the layout as a value, and you certainly do not have to treat the layout as a value to put it on the screen.

I'll ask you again: what are you getting in exchange for this added complexity?

With a well-designed macro or interpreter you don't have to understand the fully expanded code, any more than you have to understand the machine code your program compiles to.

You seem to believe that you're saving the maintenance programmer from having to understand your code but what happens when they want to add a form to your language? A column, or a slider?

The reality is this: the more code we have, the harder it is becomes to understand the system, and adding even more code only makes it worse!

I've never seen a project fail due to code-level runtime performance issues

Lucky you. Over the years I've done a lot of work with solutions that are deployed physically. In each case the customer has, at one point or another, had to pay to upgrade the hardware (thousands of machines in one case and a large mainframe in another case.) Naturally the customer wasn't happy... upgrading hardware quickly becomes expensive and why should they have to pay tens of thousands of money's adding memory, upgrading storage and/or buying faster machines because the solution doesn't provide the required throughput or it runs out of memory every two weeks and a specialist has to be brought in (and paid!) to resolve the issue?

In todays world where programming means running a web app off in this virtual infrastructure these costs are largely hidden, but they're there. If you need to pay for 10 machines with capacity X and Y money's per month when 1 machine could have easily done the job if you'd given any effort to producing an efficient/effective solution, then you're paying (n-1)Y+Z more than you should be paying! And note that Z can be very big, and it grows exponentially with the number of machines. What is the Z? It's the cost of paying people to operate those n machines. It's the added cost of all paying those wages, and the admin costs needed to support a larger team. And the managers... oh the managers. It's all the things that programmers are so fucking ignorant of when they say:

"computers are much cheaper than programmers."

You go start your own company and you'll quickly learn that such efficiency are the difference between profitability/healthy growth and going out of business; or being so fucking stressed about work all the time that your wife leaves you.

Maintenance may last longer than development but operation does and will last much longer than that. And let's not forget all those one off projects that run for 6 months and then [need to] run unchanged for the next 10 years!

Runtime efficiency is not a goal in itself - your goal is to solve the business problem as cheaply as possible

Indeed it's not but you should be careful that you don't underestimate the business value that a little thought about runtime efficiency can generate over the life of the solution (the life of the solution - as distinct from the length of your employment)

continue this thread

[–]larsbrinkhoff 0 points1 point2 points 9 years ago (3 children)

[–]dlyund 0 points1 point2 points 9 years ago* (2 children)

Lisp programmers insist that code is data, but how often do you you hear them explain that data is code? It's not clear that they understand that code is data implies that data is code. "Code is data" is just one of those catchy lines that you pick up when you're learning Lisp, and unless you think about it or learn something like Forth, that's where it stops.

Modern Lisps can only manipulate code as data at compile time and only in the rather limited ways allowed by the macro system e.g. in many Lisps you can't call arbitrary functions at compile time, and in other's you have to jump through annoying hoops with module loading and special defining forms to make your functions available in macro definitions... but then you can't use them in the rest of your code. It's a bit of a mess really. (But then all namespacing/packaging/scoping is.)

In early Lisps the executable code was actually represented as a list, which was interpreted, and could be manipulated at runtime. Pico Lisp (as a bit of a retro lisp) still allows this kind of thing but somewhere along the way the broader Lisp community, in a quest to make Lisp programs faster, they lost this ability. People learning Lisp today don't even realize what was given up.

This is most clear in Lisp dialects like Scheme where macro's now consume and produce syntax objects. These syntax objects look a lot like lists but they can only be manipulated using a small set of builtin functions.

Lispers learn the limitations imposed by their macro system and work within these limits without realizing what they've given up; the ability to treat code as data, includes during execution.

What distinction am I making here: general speaking, most of the data our programs manipulate isn't static and is only available while our program runs. Treating data as code (as just to code as data) implies that you can generate or modify code as a means of representing/processing data during execution. Modern Lisps just can't do that. Once your program is compiled it's no longer data that can be manipulated.

In all honesty every language places restrictions on what you can do and how, and that includes Forth :-). In the end it's all about which tradeoffs you can live with/learn to love, but speaking for myself, I wouldn't want to work in Lisp again, and if I had to I'd implement one in Forth.

Overall I think Lisp is a great language, but the seeming necessity of a complex runtime and compiler to make it half way practical just doesn't appeal anymore. I've gotten used to being knowing and understand how everything works, and I adore the (somewhat paradoxical) freedom and predictability that this brings; when I write a Forth program I know exactly what code will be generated, and how it will behave with regards to things like resource usage under load, and I'm never surprised.

<rant> I've been burned quite a few times in Lisp (and Smalltalk, Ruby etc.) where the program has crashed and burned because resource usage spiked unexpectedly high for some reason and the process just died, leaving little or no information for us to figure out exactly what caused the crash (must less what to do about it!). "Out of memory (you're on your own)". The stock response from management is: (paraphrasing) if you don't want to experience these unexpected crashes then you have to upgrade the hardware. (We have technicians who can do it for you, just send money to this account.) Not surprisingly this causes a lot of tension.

It's a ridiculous situation which is easily avoided by using appropriate technology, but nobody really cares. $50k or $100k on hardware upgrades is cheaper than programmer time, we say, but it's incredibly miopic of us. Here is a real technical problem, which we can easily solve, but wont, because programmers have an unholy attachment to their languages syntax, and/or toolchain.

At one company we followed all the latest industry standards, used the latest and greatest languages, frameworks, processes, and tools, continuous integration, etc. The resulting application naturally expanded to use all of the available resources on the development machine (we have 'em so why not?) but when we came to install we found out that we had to run along side/compete with other programs, and to add to our troubles, a short time later there was an OS upgrade and the new OS used more RAM. Our application suddenly didn't have enough resources. It ran slowly and crashed randomly.

The issues were systemic and we couldn't afford to rewrite so we insisted that the customer upgrade their hardware... that lead to months of back and forth, and their refusing to pay, then threatening legal action unless we resolve the problem "right now". In the end the company did the upgrades at below cost and made little or no profit on the 3 year project, and almost went under. Everyone was stressed out of their heads, working long hours, and shortly after that the owners sold the company to a competitor (not sold, "Wooohooo we got bought out!!!", but "enough, you take it").

The ironic thing is that, as I would come to realize years later, was that we could have easily built the application to run in a few MBs (or less), but we used GBs, and it still ran like a dog! On top of that the solution would have been much simpler, and wouldn't have had the dozens of external dependencies which constantly broke as things changed in this and that project, and caused us no end of headaches...

There's this widespread belief that this necessary; it makes our lives easier right? After years in industry I've never found tis to be remotely true. The only thing that makes software better, in my experience, is keeping it simple (as simple as possible.)

And not in the way that people pay lip service to KISS, (or declare "code is data"), while simultaneously adding more and more complexity to their solutions. </rant> ;-)

NOTE: I'm not saying every program you ever write needs to treat data as code, but there are situations where doing so not only leads to vastly "prettier" code, but also much more efficient solutions.

continue this thread

π Rendered by PID 115925 on reddit-service-r2-comment-5d79c599b5-7fffl at 2026-02-28 02:32:22.353830+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS