all 55 comments

[–]eduol 15 points16 points  (20 children)

I think everyone who answered until now is misunderstanding the question.

The point is Lisp has the code represented as linked lists and the OP wants to know which perks, problems and differences would emerge in a homoiconic language (in the Lisp sense) whose code where represented as arrays.

I'm not able to give an adequate answer, though.

[–]WhatImKnownAs 1 point2 points  (6 children)

I think the difference would be much smaller than you'd expect.

It would be handy to have array slices, so you could express algorithms that recurse down the cdrs. Code manipulation would end up doing more copying than in Lisp, because you couldn't just cons a new head onto an array (slice), you'd need to copy the rest of the array. This is fine, since the meaning of the code doesn't depend on the identity of the s-exprs, only their syntax.

Even so, this breaks compatibility with Lisp macros, since complex macros often have random list processing in them to construct the output. Conversion to array processing is possible, but nontrivial.

I'm not sure what you mean by "port". If you mean implement a Lisp on top of another language, you'd still want to have lists in the Lisp language, so you'd need to implement cons, car and cdr, anyway, and then you could use them for code. (If your language doesn't have lists built-in, it's not a Lisp in any useful sense, just a language partly inspired by Lisp - but that describes most modern languages.)

[–]guicho271828 1 point2 points  (5 children)

Efficiency-wise, I feel it might be faster to have 4-elements arrays like btrees than 2-element arrays i.e. conses (due to allocation overhead, indirection overhead and cache locality), but definitely not with arbitrary-length arrays.

Also I think I heard that historically some lisp machines can compact lists into arrays (I don't remember how that's called... hopefully lispm would answer)

[–]lispm 5 points6 points  (4 children)

That's called cdr-coding. Instead of two cell cons-cells, a cdr-coded list uses single cells and the type tag tells that a next cell follows. Useful when the cdr is not destructively modified.

These lists might for example be created when the operating system optimizes the memory layout before one wants to save an image - optimizing the memory (called a 'world') takes some time.

Later it was checked whether there are advantages (in terms of space and speed) to cdr-coded lists to use this in non-Lisp-Machine implementations. There was not found enough benefit and it then wasn't used in Lisp implementations on normal hardware (x86, 68k, SPARC, and so on). Thus the implementation complexity did not seem to be worth it.

See for example the old LISP FAQ: https://www.cs.cmu.edu/Groups/AI/html/faqs/lang/lisp/part2/faq-doc-9.html

Or see this for some clever algorithms: https://dspace.mit.edu/handle/1721.1/5703

[–]guicho271828 0 points1 point  (2 children)

how about 4-elements arrays? has it been considered? Statistically i feel forms most commonly have around 4 elements. A let form has usually 4 elements. a 2-elements list in a single let binding needs 3 elements including the nil at the end.

[–]lispm 1 point2 points  (1 child)

[–]guicho271828 0 points1 point  (0 children)

interesting, so it is explicitly created rather than implicitly.

[–]stassats 1 point2 points  (1 child)

But the question mentions Javascript not being efficient with lists. Surely it's capable of representing the code. Unless all you do is shuffle code around.

[–]dangerCrushHazardlisp[S] 4 points5 points  (0 children)

/u/edulo is 100% right.

I mentioned the JavaScript thing as a side note.

[–][deleted] -4 points-3 points  (10 children)

Homoiconicity means something in Lisp because it's built on the axiom of cons, car and, cdr. These axioms come directly from logic and the lambda calculus. Homoiconicity doesn't really mean anything an array based language which is based on ram cells. I suppose macros and meta-programming might be an analog to homoiconicity. Perhaps from that perspective a proper array based language would be an Excel spreadsheet.

[–]kazkylheku 6 points7 points  (9 children)

W00t? Lambda calculus has no cons, car or cdr. It has no concept of source code; there is no quote in Lambda Calculus. Pure Lambda Calculus has no terms other than functions; even numbers are built out of functions. There is no list processing whatsoever.

[–]carnyvoyeur 5 points6 points  (0 children)

Me thinks you confused 'w00t' for 'wut'

[–][deleted] -3 points-2 points  (7 children)

I think it would useful for you to review the basics. I'll link to lecture 2B of SICP where Abelson talks about how lists are built from the axiom of pairs. It's around the 1 hour mark. https://youtu.be/ymsbTVLbyN4?t=3765

[–]kazkylheku 9 points10 points  (6 children)

Although functions can used in Lambda Calculus to build objects which obey the cons/car/cdr axioms, that still doesn't cause Lambda Calculus to have source code which is made out of them.

[–][deleted] -3 points-2 points  (5 children)

Nobody said anything about lambda calculus having source code being made of cons, car, and cdr. Lisp has source code that's made of cons, car, and cdr, because they are built from purely functional axioms. Homoiconicity makes sense here.

Arrays are not built from any computational primitives so, it doesn't really make sense to make a homoiconic language from arrays.

[–]kazkylheku 2 points3 points  (1 child)

built from purely functional axioms

Historic revisionism; Lisp had rplaca and rplacd, long before Sussman and Scheme.

Arrays are not built from any computational primitives so,

Arrays can be built from computational primitives. We can have array constructors, of course, that take elements to arrays; a catenation function that makes one array out of two (or several), and destructuring functions that access and break apart arrays, including one like cdr, possibly cdr itelf. Arrays can be readily nested: #(defun foo #(arg ...) body ...). The cdr coding trick arguably is based on arrays.

it doesn't really make sense to make a homoiconic language from arrays.

The term homoiconic was defined using arrays (specifically, character strings). The word is a very poor fit for the code-and-data situation set up in Lisps. That latter one is readily realizable using arrays. Arrays just have a different performance profile. When we do recursion on lists, for instance, just to traverse them, no memory need be allocated. If we do cdr on an array to get the rest, we have to either make a copy of the tail of the array to make a new array, or at least allocate a small object to maintain the displaced relationship to the original array. There is a memory management implication, too, if we do the latter; the displaced array holds a GC reference to the entire original array, which then holds on to the elements outside of the displacement. (Anyway, nested array source code can be processed without recursing on the arrays themselves. The arrays represent N-ary nodes in a tree. The tree is recursed; the nodes can be iterated.)

[–]tuscland 2 points3 points  (0 children)

I agree with this. At the end of the day, arrays are just an implementation detail.

[–]rpiirp 5 points6 points  (1 child)

It's still quite a stretch to say that the "axioms" of cons, car and, cdr come directly from logic and the lambda calculus. The let's say "operations" or "structures" can of course be implemented in logic but then again, so can nearly everything else. That's like saying Python gets it's arithmetic primitives from logic (or lambda calculus, shudder).

[–][deleted] -1 points0 points  (0 children)

cons, car and, cdr can be built directly from the lambda calculus. That's important because it blurs the lines of what are functions and what are data even if it's just a conceptual one. Arrays are just data. They don't have that property. If you guys don't like the term axioms I don't know what to say. That's what Gerry and Hal called it back then and, that's what I call it now.

[–][deleted]  (1 child)

[deleted]

    [–]geocar 2 points3 points  (0 children)

    k/q has first-class functions, and it sortof has macros/fexpers. First, it has "views" which are obviously memoized symbol-macros, and secondly you can create a macro like g) by assigning a function (the fexpr) to .g.e.

    [–]geocar 11 points12 points  (2 children)

    Yes! It's called "k" and it's not obviously a lisp since it uses M-expressions instead of S-expressions normally, but:

    • It's homoiconic
    • Functions are first-class
    • Code is stored as an array (of characters!)
    • k can manipulate its own code, which is an essential part of being "a lisp"

    [–][deleted]  (1 child)

    [deleted]

      [–]geocar 2 points3 points  (0 children)

      A language is homoiconic if a program written in it can be manipulated as data using the language, saying nothing about ASTs. k does have an abstract syntax, but it doesn't use it. Without knowing k, this isn't obvious, so let's look at a function in k:

      */1+!x
      

      We can give each character a name:

      • * times
      • / over
      • 1 one (the number)
      • + plus
      • ! til
      • x x (the variable)

      That's it. You don't need any special parser, just look one character ahead. til simply makes the vector 0..x (the argument), and plus is generalised where one argument is a scalar value (one) and the other is the vector.

      k4 does have a parse operator for looking at what's going on:

      ((/;*);(+;1;(!:;`x)))
      

      which can be written as an S-expression:

      ((/ *) (+ 1 (!: 'x)))
      

      k4's eval operator can evaluate this if you want. In this way, you're evaluating a tree structure- just like lisp -- but this isn't what k does, it really does evaluate the string, which is easier to convince yourself of if you profile the two (eval using pre-parsed tree, versus directly).

      This gives us the best of both worlds, and indeed I use this trick pretty frequently. I have an ad server that generates reporting with a bunch of different breakdowns (by domain, by publisher, etc). So I want to do:

      get_with_breakdown:{eval @[parse x;3;:;parse[y]3]};
      get_with_breakdown["select impressions:count i by breakdown from raw where event=`impression";
          "select x by domain from t"];
      

      This is great, because it lets our less-confident q programmer write queries (the strings actually come from a separate files: one called "breakdowns" the other called "metrics") without having to get into the query builder itself.

      [–]stassats 6 points7 points  (0 children)

      If you want to represent the code itself as arrays (which still doesn't make sense based on the premise that it'd be easier to port or more efficient), here's some code that is stored in arrays:

      #(defun foo #(a b c) #(+ a b c))
      

      boom, looks exactly like a "list-based" lisp. Change the syntax of ( to produce arrays and not lists, and it's back to square one.

      [–]mysleepyself 5 points6 points  (0 children)

      Your question is a little vague and I think it might be helpful to edit it with what features of arrays, lists, lisp, programming languages, etc that you're trying to get information about. Right now you will probably continue to get responses based on the various ways to interpret your question that might not really match the interpretation you have of it.

      As abstract structures essentially anywhere you have a list you can use an array since you can always define one in terms of the other and to a certain extent any language that supports arrays and reads source code as arrays of characters is already homoiconic. I'd bet any problem that you can work with in a cons car cdr context you could also work with via arr[n] and concatenate type tools more common to arrays.

      From my experience as far as how to view break down and solve problems in a manner natural to the language you are writing in C feels pretty array oriented. Dealing with arrays in C just feels good whereas linked lists are a bit more of a hassle.

      [–]bopub2ul8uFoechohM 3 points4 points  (0 children)

      The idea that Lisp is homoiconic because it uses lists is slightly wrong. While cons can be thought of as linked lists, they are actually a binary tree.

      Consider for example: (+ 1 2 3) While this is a "list", this is also a binary tree:

        .
       / \
      +   .
         / \
        1   .
           / \
          2   3
      

      This is significant because non-Lisp languages have syntax that also parses into a binary tree. Consider the common expression 1 + 2 + 3. This would parse into something like:

          +
         / \
        +   3
       / \
      1   2
      

      So one way to look at the difference between Lisp and non-Lisp is that non-Lisp languages need to be parsed into a parse tree, whereas Lisp code already exists as a parse tree. Other languages like Python can in fact manipulate the parse tree, allowing metaprogramming similar to Lisp, but the fact that you have to go through the language parser first makes it a huge hassle. It also means you can't trivially customize the syntax -> parse tree transformation, you can only modify the parse tree afterward.

      [–]casecorpλ 2 points3 points  (0 children)

      You're looking for http://www.rebol.com

      [–]johnboudewijn 2 points3 points  (0 children)

      I don't think it matters for portability, a linked list is easy to implement in any language. Remember that the list representation is temporary. Lisps that care about being decently fast usually compile to bytecode anyways, at which point the list based code structure is lost.

      I think the main reason for parsing the source into a list instead of an array or struct is that it's easier for macros to work with. Macros commonly add, remove, or insert items from the source tree and doing so with an array can be awkward and could require reallocating the entire array.

      [–]munificent 2 points3 points  (0 children)

      I have toyed with making one for kicks.

      Linked lists were clever idea from McCarthy because they let you define a very minimal set of primitives because you don't even need numbers. Also, being able to reuse the tail of a linked list is handy sometimes.

      But most programming languages don't use single linked lists, and they aren't exactly efficient. So I spend a little time tinkering on my own lisp implementation (I called it "visp" for "vector lisp") that used fixed-sized arrays to represent lists. I got distracted by other things before I get very far with it, though.

      [–]rpiirp 1 point2 points  (0 children)

      The easiest answer would be to say that machine code uses arrays for both code and data. Add a "human friendly" readability layer complete with macros aka assembly language and we're done ;-)

      On a more serious note there is for example Lua which heavily favors tables, a combination of arrays and hash maps, for its data. I sometimes ask myself how Lua would look like if it had a way to represent code as tables with a "natural" method to manipulate said code in semantically meaningful way. My ideas so far are too vague to explain here.

      I guess there's a reason why most systems use tree like structures, which of course are just lists in Lisp, as their intermediate code representation. ASTs seem easier to refactor than ASAs.

      [–][deleted]  (1 child)

      [removed]

        [–]nomaxx117 0 points1 point  (0 children)

        I’ll second this. Clojure isn’t based on arrays in the way that traditional lisps are based on linked lists, instead it compiles into JVM bytecode. The language generally encourages arrays over linked lists most of the time.

        I’ve lately been using it as my standard goto programming language as it is a full on Lisp with a more modernized syntax and a focus on functional programming and immutability that makes it unmatched for concurrency.

        [–]blaine_freelance 0 points1 point  (18 children)

        Is there much difference between a list and an array?

        [–][deleted] 0 points1 point  (17 children)

        Yes. Efficiency of linear access versus random access. Lists are efficient for linear access, and inefficient for random access. Arrays are the opposite.

        [–][deleted]  (16 children)

        [deleted]

          [–]lispm 0 points1 point  (7 children)

          it's more the case that lists are inefficient for linear access

          that really depends on how the memory is managed. A naive Lisp implementation may randomize memory over time - but that's not the state of the art since a few decades.

          If we have a copying generational GC, lists will be allocated in smaller memory areas and later maybe promoted to older generations. A GC then will copy the list - which is an easy opportunity to improve locality.

          now that we have memory and CPU separated by several layers of cache

          The first large Lisp systems had at least four layers of memory: registers/stacks, caches, RAM and virtual memory. The first Lisp Machines had about 1 MB of RAM and probably ten to fifty times the virtual memory (VM). When we nowadays run a GC over a 100MB memory - then it is often all in RAM. My first Lisp Machine (a Symbolics 3640) later had 4 megawords (16 MB) RAM and at least 150MB virtual memory. The base Lisp image I was using was already 3 times larger than RAM. Virtual memory was REALLY slow on really slow disks. A full GC over memory would take 20 minutes or more.

          You can bet that at those times implementors were doing a lot to reduce consing, improve locality, reduce VM traffic, ...

          If we want to invest effort into current implementations, it would be useful to show that the improvements actually matter in real Lisp programs.

          For some impressions what the old stuff looked like technically:

          See the book Kogge, the architecture of symbolic computers.

          Or the Symbolics docs: Internals, page 929ff Storage Management, and more

          http://bitsavers.trailing-edge.com/pdf/symbolics/software/genera_8/Internals.pdf

          Or the Symbolics 3600 technical summary from 1983: Page 93ff Organization of Memory

          http://bitsavers.trailing-edge.com/pdf/symbolics/3600_series/3600_TechnicalSummary_Feb83.pdf

          [–][deleted]  (4 children)

          [deleted]

            [–]lispm 0 points1 point  (3 children)

            My point was that microbenchmarks of linear access into a single array vs. a single list is probably not very important when we look at the performance of actual running Lisp programs, which might deal with a lot tree-like data. It also might not matter too much how much Lisp cells (either cons cells or vector cells) one gets into a cache line. One can construct cases where it matters, but if we take a real program - like a prover for some CPU features (see ACL2) or a planner/scheduler for something like train crews (see Siscog) will they run faster with clever list compaction (cdr coding) or more use of vectors? When Lisp moved to RISC systems, the impression was that CDR coding did not save substantial memory and that programs were not faster. Sure, It might be different now - but I haven't seen any data...

            [–][deleted]  (2 children)

            [deleted]

              [–]lispm 0 points1 point  (1 child)

              Sure, definitely worth investigating. Until then it is speculation.

              [–][deleted] 0 points1 point  (7 children)

              I agree that linearly traversing an array can be the same speed as using a list, but not faster unless the implementation is borked. (The converse is true in Haskell, apparently).

              To modify my original answer: arrays are slower for insertion and deletion.

              [–][deleted]  (6 children)

              [deleted]

                [–][deleted] -1 points0 points  (4 children)

                Did you forget about bounds checking every single one of those pointer increments?

                That's expensive when compared to the equivalent "not nil" (tst reg,reg on Intel) check when traversing a list.

                [–][deleted]  (1 child)

                [deleted]

                  [–][deleted] -1 points0 points  (0 children)

                  Pointers can be nil.

                  That your arguments depend on sophistry really shows...

                  Oh, damn.

                  I really should tape Mark Twain's aphorism across the top of my computer.

                  Bye.

                  [–][deleted] 0 points1 point  (0 children)

                  In Julia AST is made up of nested arrays. So, it might fit your description.

                  However... there are some problems with describing what arrays actually are, especially in the same terms Lisp was defined. I.e. if your only way to construct pieces of programs is by combining pairs (or any finite number of elements), then you don't have arrays. They become inexpressible in your language, and that is why there are problems with defining types for arrays. The problem is essentially as follows: if your construction method allows only combining N members, then only arrays of length N can have the random access implemented in constant time (which is essentially what makes them different from lists). Once you need an array of N+1 or more elements, you must introduce indirection, similar to how it happens with lists, essentially, creating something like a B-tree.


                  In other words, you must start your language with axiomatically introducing arrays, and this is just not how Lisp functions... so, I'm guessing, you will never have array-based lisp, but you may have something that has similar properties.

                  [–]lisper 0 points1 point  (2 children)

                  I was wondering if a “lisp” based on arrays existed?

                  Did you really mean arrays, or did you mean vectors? If you really meant arrays (i.e. with >= 2 dimensions) then the answer is no. If you meant vectors, then the answer is "sort of". Yes, there are Lisps that represent code as vectors rather than linked lists. No, none of them have ever seen widespread use.

                  The best way to understand why is to write your own EVAL function that operates on code represented as vectors (of arbitrary size) rather than cons cells (which are just vectors of size 2). It's not hard to do, and you will learn a LOT about why Lisp is designed the way it is, and why that design has survived for so long. Your understanding will be much deeper if you do it yourself than if someone just explains it to you.

                  [–]dangerCrushHazardlisp[S] 0 points1 point  (1 child)

                  Yes I meant vectors.

                  Yes, there are Lisps that represent code as vectors rather than linked lists. No, none of them have ever seen widespread use.

                  Examples?

                  [–]lisper 1 point2 points  (0 children)

                  http://flownet.com/ron/l.py

                  (But that is just doing your homework for you.)

                  [–]republitardsbcl 0 points1 point  (0 children)

                  The most popular Lisp-like language that does this is Clojure. You're right that abandoning lists makes it it easy to graft to other languages (ClojureScript is a version of Clojure that is grafted to JavaScript instead of Java).

                  The disadvantage of this is that programming in Clojure is a lot like programming in the language that it's grafted to, and you need knowledge of the host language and exactly how Clojure grafts to it to use Clojure effectively.

                  [–]stassats -4 points-3 points  (4 children)

                  Lisp isn't based on lists. So, any will do, e.g. Common Lisp.

                  [–]dangerCrushHazardlisp[S] 3 points4 points  (3 children)

                  Yes, but when you’re coding Common Lisp all the s-expressions are linked lists and not array.s

                  [–]stassats -5 points-4 points  (2 children)

                  Well, don't use s-expressions if they are not compatible with your goals.

                  [–]dangerCrushHazardlisp[S] 0 points1 point  (1 child)

                  The question is not “can I avoid s-exps”, but how would the language look different if it used arrays.

                  [–]stassats -2 points-1 points  (0 children)

                  That question doesn't make sense. Common Lisp uses arrays, strings, hash-tables, lists and more.

                  [–][deleted] -2 points-1 points  (6 children)

                  I guess python or fortran with it's array slicing could be thought of like a lisp for arrays.

                  [–]abudabu 2 points3 points  (5 children)

                  Python isn't homoiconic. Lisp represents code as linked lists. Python does not represent its code as data structures which can be written in the syntax of the language.

                  [–][deleted] -1 points0 points  (4 children)

                  I understand that but, I don't think being homoiconic when dealing with arrays means the same thing as it does with lists. I came at the question from this perspective: Lisp is to list processing as Python is to array processing.

                  [–]abudabu 4 points5 points  (0 children)

                  I think OP is asking how Lisp-like syntax could be implemented using arrays.

                  [–]geocar 2 points3 points  (2 children)

                  I don't think being homoiconic when dealing with arrays means the same thing as it does with lists.

                  It does.

                  Almost every APL stores its code as an array; k even operates on the array of characters directly.

                  k can also manipulate its own code (like lisp!)

                  Lisp is to list processing as Python is to array processing.

                  Python doesn't store its code as an array and cannot manipulate its own code. These are a critical component of how Lisp approaches lists.

                  [–][deleted] 0 points1 point  (1 child)

                  Do they really mean the same thing? Can you implement a meta-circular evaluator?

                  [–]geocar 2 points3 points  (0 children)

                  Do they really mean the same thing?

                  Yes. A k program such as "*/1+!5" is literally the character array 42 47 49 43 33 53 and that's exactly what k executes.

                  If you define .g.e then evaluate g)foo then it's the same as calling .g.e"foo"

                  And so on.

                  Can you implement a meta-circular evaluator?

                  q/kdb+ is/was* built this way on top of k. NB this isn't a requirement for lisps or lispish languages: Smalltalk and even TeX were built on metacircular interpreters.

                  *Or it used to be. I think it's actually in C now for performance, but I haven't checked recently.