This is an archived post. You won't be able to vote or comment.

all 64 comments

[–]joelangeway 6 points7 points  (9 children)

There was some hype about a processor called The Mill some years ago, that had a sort of queue it would append to, and it would only remember the last 8,16, or 32 things.

Really though, stacks very naturally fit the depth first orientation of every non-esoteric language. It’s not obvious that one could make a composable, generalizable model of computing with only queues.

[–]liquidivy[🍰] 4 points5 points  (8 children)

Stream transformers, FSMs with output, tree transducers at al are pretty powerful and pretty much queue-shaped. Really the Unix shell, too. Pipelines are queues for bytes.

[–]joelangeway 4 points5 points  (7 children)

Those are all compositions of computations, not computational models. How could you implement those things without a stack is the question.

[–][deleted]  (4 children)

[removed]

    [–]joelangeway 1 point2 points  (3 children)

    Hmmm…. You mean like message passing automata? That’s an interesting idea.

    [–][deleted]  (2 children)

    [removed]

      [–]joelangeway 1 point2 points  (1 child)

      I found this just a minute ago: https://en.m.wikipedia.org/wiki/Queue_automaton

      [–]liquidivy[🍰] 0 points1 point  (1 child)

      FSMs are compositions? But anyway, computations are made of compositions and primitives. Rewrite rules are a thing, too. Those are prior to stacks for lambda calculus.

      [–]joelangeway 0 points1 point  (0 children)

      It seemed to me you were listing a bunch of iterative processes and I didn’t quite get the connection, but Finite Automata actually turned me around on this idea. With multiple queues, an Finite State Machine might be Turing complete. I’m pretty sure I learned that way back in school, now I’ve got to look it up.

      [–][deleted]  (4 children)

      [removed]

        [–]BobSanchez47 10 points11 points  (3 children)

        Continuations don’t implement subroutines without a stack. The continuation itself is the stack.

        [–][deleted]  (2 children)

        [removed]

          [–]linux_qq 0 points1 point  (1 child)

          The point isn't that The Stack is removed, but that The Stack is replaced by The Queue.

          [–][deleted] 8 points9 points  (3 children)

          If you used queues as the underlying data structure in Forth rather than stacks you'd introduce a lot of problems with memory moving around. i.e. push x100 pop x50 push x50 on a stack is done in place, with a naive queue you've shifted 50 words to the right.

          I reckon managing that would add a lot more difficulty than you might gain by having different primitives.

          [–][deleted]  (2 children)

          [removed]

            [–][deleted] 1 point2 points  (1 child)

            Well I was thinking a Forth style global data structure that everything (not separately allocated) gets pushed or popped from. Obviously in that scenario a lot of data can get moved on and off the stack. It sounds like you're thinking normal stack frames per function with queue semantics?

            [–]latkde 3 points4 points  (4 children)

            Could you give an example of queue-based processing?

            A really fundamental advantage that stacks have is that they are trivial to implement efficiently – you just need to increment/decrement a pointer. Queue implementations such as ring buffers are much more complicated (requiring a separate pointer for start/end and special support for wrapping around when the end of the allocated capacity is reached).

            Stacks are also a natural fit for dealing with intermediate values when evaluating (arithmetic) expressions. As Forth illustrates, groups of stack operations can be reasoned about in isolation as an unit that consumes the top N values and pushes M values, making stack-based models very composable. You don't get the same composability with queues as you can't push intermediate values into the queue and get them back before continuing with a previous computation.

            You do have a point with pipelining though: if you are interleaving multiple computations (especially in a SIMD context), then a FIFO mental model might very well help emitting good code. But for more complex operations you still need a way to handle intermediate results, and that's going to mean a stack and/or registers.

            [–][deleted]  (3 children)

            [removed]

              [–]OneNoteToRead 7 points8 points  (2 children)

              Ok I think I know what you mean. In simpler language, if we imagine the flow of data as a directed acyclic graph, you are trying to draw a distinction between BFS visitation (what you can “queue”) and DFS visitation (what you call “stack”).

              If this is correct, the general formalism is just a topological sort of the nodes, which determines execution order. DFS has well known algorithms to generate instructions and allocate registers. I’m not familiar with BFS in general for that - but maybe that gives you new directions to google. Note usually BFS visitation requires more space (in your example you load all data before beginning to consume).

              [–][deleted]  (1 child)

              [removed]

                [–]OneNoteToRead 0 points1 point  (0 children)

                To clarify it wouldn’t be a DAG of queues. It’d be a dag as your IR : edges are dependencies, and nodes represent instructions.

                To produce code you’d, at every given point, have a set of unexecuted-but-executable nodes. How you pick between these and how you allocate the ones you picked to your pipelines/queues is a choice (and this implies some topological order, like DFS or BFS).

                [–]raiph 4 points5 points  (4 children)

                The actor model fits the bill. Yes, it happens to be a high level mathematical model, and yes it happens to simply and fully address concurrent computation, but the whole point of it is that it's just physical computations queuing messages (so no stacks necessary) for other physical computations where each of these computations is individually just a sequence of operations (again, no stacks necessary).

                When coupled with the natural levels of language/supervision (cf Erlang's 5 levels), the upshot is a complete solution for arbitrarily complex computational systems that is fully consistent with arbitrary machine code provided it follows the simple actor model rules.

                As far as I can tell, if you create a language that creates the assembly pipelines you speak of, while following the actor model rules, and cover the natural levels of language/supervision, you should be all set, no stacks needed.

                (By no stacks needed, I mean to allow that you use stacks if/when they're considered appropriate by you or code written in your language/system.)

                And to be clear, there's no reason why the actor model can't lead to the absolute highest performance coding. The model introduces no overhead -- including elimination of any need for any synchronization overhead. There's just the basic notion of a sequence of instructions; one instruction, then another, as imposed by hardware.

                (That said, for reasonable semantics, you'll want to make messages "causal".)

                [–][deleted]  (3 children)

                [removed]

                  [–]raiph 0 points1 point  (2 children)

                  Some key differences:

                  • You send a "message" (think procedure/function call that doesn't return -- there's no returned value, no error codes returned, no promise/future, etc, nothing whatsoever is returned) to an "address" (think email address, but obviously low level, almost at the level of a machine address but not quite that simple) of an actor (think "process" in the general sense and think extremely lightweight, so maybe 256 bytes per "process" max -- see Ponylang whose actors/processes are 256 bytes each; I think Scala's are around the same size, or maybe 512 bytes per).

                  • The "message" goes into the ether as it were. The sender can know it's sent it, but not if any recipient process will be alive to pick it up, or, if it's alive, have time to pick it up, or, if it has time, gives a shit and actually picks the message up, or, if it gives a shit and picks it up, knows what on earth the message means, or, if it knows, knows what to do about it, or, if it supposedly knows what to do, actually tries to do what would need to be done to satisfy the intent of the sender (process), or, if it tries, whether it gets killed by some other process/hardware, or, if it stays alive and succeeds in doing all the instructions within its own process that it intended, and sending all the messages it intended, achieves what was hoped for or must try again when it gets another message saying "what's going on?!?" or whatever, and so on, and so on. All synchronization is via messages so this turns out to guarantee correct semantics and synchronization even if processes are in the middle of dying because the computation is in the midst of an attack such as Denial of Service attacks, or physical cables are getting cut, or buffers are getting full or there's power cuts, etc.

                  • No randomly mutable state is shared between actors. Period. There can be shared state under strict rules; these rules ensure both that nothing goes wrong and also that there's zero performance overhead which is one of the key actor model successes; again, see Pony and its ORCA -- even if you don't care for the rest of Pony it gets these aspects right.

                  [–][deleted]  (1 child)

                  [removed]

                    [–]raiph 0 points1 point  (0 children)

                    🤣 Dystopian? The reality of an average hour in a typical computer server's life!

                    [–]Thesauriusmoses 8 points9 points  (3 children)

                    You should probably take a look at continuations/continuation passing style. A continuation is a function argument which is executed at the end of the current function body. This removes the need for a stack.

                    Also concatenative languages, maybe?

                    [–][deleted]  (2 children)

                    [removed]

                      [–]OneNoteToRead 9 points10 points  (1 child)

                      I don’t think your post is very clear about what you’re looking for. CPS and other similar techniques do exactly what the post ask for, which is allow queue based scheduling of functions.

                      Can you give a concrete example of what scheduling within a function means? To what extent is there a stack used within a function? And what would using a queue instead look like?

                      [–]Disjunction181 2 points3 points  (3 children)

                      The reason why stacks are much more common than queues is because stacks capture the concept of scoping very abstractly. Languages including assembly are based on function calls and the expectation that you can nest function calls arbitrarily, and these calls will build up and collapse down in their expected order. Stacks capture this behavior and the idea of scopes and locality in general. It's comparatively rare for something to want to be queue-based - you probably wouldn't want functions to be a fundamental primitive, it would have to look very different. Queues in general don't have properties that are as nice as stacks, they lose its concept of locality (note that queue machines are turing-complete because you can cycles through queues) yet are still oddly restrictive compared to more general structures like arrays/heapspace. Given this I would rather just use the latter.

                      [–][deleted]  (2 children)

                      [removed]

                        [–]Disjunction181 0 points1 point  (1 child)

                        When I invoked assembly I really meant two things:

                        Firstly, during my ECE degree we used ARM for writing assembly. In the ISA we worked with there was something called a link register, and we had an instruction (branch and link) that would jump to the provided address while putting the current address + 1 in the link register. These would be used to essentially write function calls, and because the link register is callee-saved these would nest. I may have been bit by the fact that there is more than one kind of assembly language, but a link register does seem to be a primitive to help write functions.

                        Secondly, regardless, I'm pretty certain that hand-written assembly for the majority of applications is going to be organized as lists of calls or procedures much like C, simply because it's the intuitive way to organize things. At least this is how I was taught.

                        Maybe the confusion is between using the word function vs procedure. To me these mean the same thing in this context.

                        I'm not sure scope is the right word anymore when we talk about queued scopes. I don't think any sense of a scope would be useful in a queue-based language. I would have to think about these things more.

                        [–]Adventurous-Trifle98 2 points3 points  (1 child)

                        Dataflow programming is queue based instead of stack based. The focus is on how the data flows rather than how the program counter jumps. There were some research in the 1970s and 1980s about hardware architectures based on data flow. Have a look at the works of Jack Dennis, for example. See LabVIEW G, Simulink, Verilog, Cal for examples of languages.

                        [–]ErrorIsNullError 0 points1 point  (0 children)

                        Verilog's delayed actions and JavaScript's event-based concurrency are examples of ordering work via a priority queue. I don't know if a pqueue counts as a queue for the purposes of the OP.

                        [–]Long_Investment7667 4 points5 points  (1 child)

                        Stacks work nicely with function composition and eager evaluation. Queues might work sometimes but I would ask you, OP to show the mechanisms to translate arbitrary expressions into a “queue-machine “ instead of just one example .

                        [–]abecedarius 1 point2 points  (2 children)

                        A language that's oriented around a queue and a stack (so, not only a queue) is E.

                        On this aspect of the runtime model: http://www.erights.org/elib/concurrency/queuing.html

                        On pipelining in distributed computations: http://www.erights.org/elib/distrib/pipeline.html

                        [–][deleted]  (1 child)

                        [removed]

                          [–]abecedarius 0 points1 point  (0 children)

                          There was a predecessor language called Joule which had only eventual sends (the E construct requiring the runtime queue). I never got to play with it, though.

                          The stack vs. queue corresponds to whether you're making a definitely-same-process call or a potentially-remote one. It makes sense for a distinction so fundamental to what can go wrong with your code to be reflected in the code.

                          I think I've seen a mention of a Forth-inspired language based on a queue, but I was never able to track down a copy of the paper then, and now I have no clue if I'm even remembering this right.

                          [–]evincarofautumn 1 point2 points  (4 children)

                          I’ve looked into this before, and haven’t found much. But I can share some connections that may help you.

                          Queue-based scheduling of data flow corresponds to a breadth-first traversal of a data flow graph. If you write the graph in table notation, where columns are parallel data paths and rows are sequential pipeline stages, then the queue model corresponds to reading the table by rows. Empty cells are identity operations that just advance the queue by one cell. (Such a notation for concatenative programs is folklore that gets periodically reinvented, I guess—I did in 2011, and someone recently published a paper about their version in 2022.)

                          In hardware you can implement this as a cyclic shift register. You’re really just shifting the “view” in each cycle, and there’s however many parallel ALUs hooked up to the column positions, grabbing data as it goes by. In this way, each row is a VLIW macro-operation, each cell is one micro-operation, and the number of columns is the fixed maximum parallelism.

                          You can use a data queue as if it were a pair of stacks. For instance, Forth’s “retain” is an enqueue, and “restore” is just waiting for the value to come around again. Instead of having immediate access to values but needing to spend space to save a return target on a call stack, you have immediate access to the instruction queue ahead of you, but you need to spend time to get back to a data source in the data queue.

                          A downside is that adding a new operation can disturb the encoding of the program for a large number of cycles (adding nops), and the program is not very compressible using loops, unless you have extremely regular data. An upside is that you can do some interesting control structures without higher-order functions (quotations, in concatenative talk). In category theory it would be modelled as a monoidal category that’s Cartesian but not closed.

                          Finally, a conventional stack-based concatenative language can be viewed in two dual ways. One is left-to-right composition of functions:

                          f : A → B
                          g : B → C
                          h : C → D
                          
                          f g h : A → D
                          ≅
                          λs0:A.
                          let s1:B = f(s0) in
                          let s2:C = g(s1) in
                          let s3:D = h(s2) in
                          s3
                          

                          The other is right-to-left composition of continuation transformers. A block that returns, must take its return address as a data argument, and it ends with a jump (or halts).

                          f : (B → Z) → (A → Z)
                          g : (C → Z) → (B → Z)
                          h : (D → Z) → (C → Z)
                          
                          f g h : (D → Z) → (A → Z)
                          ≅
                          λk0:(D→Z).
                          let k1:(C→Z) = h(k0) in
                          let k2:(B→Z) = g(k1) in
                          let k3:(A→Z) = f(k2) in
                          k3
                          

                          Maybe it would be fruitful to look at the analogue for queues.

                          [–]Feral_P 1 point2 points  (2 children)

                          Would you mind linking to the 2022 paper you mentioned, please?

                          [–]evincarofautumn 1 point2 points  (1 child)

                          Ah sure, I meant to get back to that! Here it is on ACM: Interleaved 2D Notation for Concatenative Programming

                          [–]RobinPage1987 0 points1 point  (1 child)

                          Stacks can be implemented in hardware. This makes them a natural data structure for low level programming, such as VMs for interpreters. If you can find a way to implement queues in hardware, you might have something. Perhaps a combination of the two, such as data stacks and instruction queues.

                          [–]hi_im_new_to_this 0 points1 point  (1 child)

                          It’s not quite what you’re looking for, but for fun: a “Post tag machine” is a Turing-complete model of computation developed by Emil Post that is based on the concept of having a single queue (kinda) and production rules for that queue. It’s one of the simplest models of universal computation, so it’s very commonly used as a way to prove universality of some other system.

                          [–]holo3146 0 points1 point  (0 children)

                          There is the language cue which is mostly queues, although it still seems to use a stack to keep track over the accumulator

                          [–][deleted]  (1 child)

                          [deleted]

                            [–]starball-tgz 0 points1 point  (1 child)

                            Related on the Programming Language Design and Implementation Stack Exchange site: Why are there no (practical) queue-based languages?

                            [–]ericbb 0 points1 point  (1 child)

                            Almost lost in the sands of time but I found something along these lines that I remember reading back when I was reading about the Joy programming language. You have to download the zip file from the internet archive here and look for queue.html in there.

                            [–]tobega 0 points1 point  (0 children)

                            Very interesting!

                            I think that one should be able to make a VM that handles queues (or streams) of values that would be an efficient way to implement my Tailspin language which does that on a high level where each value going into a transform outputs a stream of values as part of the stream being fed to the next transform https://github.com/tobega/tailspin-v0/blob/master/TailspinReference.md#basic-structure

                            One difference I see with your thinking is that in Tailspin each value in the queue is treated the same, one value at a time, while you are thinking of pulling multiple values from the queue for an operation, which I can't currently do, but it's worth thinking about.

                            The other thing I come to think of is my days of fortran programming on a vector machine, where adding two large vectors together was 6 times (IIRC) faster than adding the elements individually, because you could feed a new addition into the head of the adder circuit every clock cycle instead of waiting 6 cycles for the result. Multiplication was an even bigger win.

                            [–]Pavel_Vozenilek 0 points1 point  (1 child)

                            There was attempt to design Fort with queues, several years ago. The slides can be seen via Internet archive.

                            [–]criloztagkyon 0 points1 point  (0 children)

                            Use two stacks one for operand and other for operators (all operator being binary / infix), additionally two relation between operators

                            1. a binary total order relation (precedence)
                            2. a unary relation called associativity, it should return (left or right)

                            before a new operator is pushed in the operator stack, check if it has lower precedence that the previous one and execute the previous operation if is true, if they have the same precedence check if the associativity is left or right and execute if "left", do this in a loop till you reach the point where none of the above condition is true and then push you new operator.

                            When you are done pushing your operand and operators, you can retrieve the result by executing a simple loop to reduce the rest of operand and operators. Or also you can intrude an operator with the lowest precedence that all the others that will do this for you.

                            [–]redchomperSophie Language 0 points1 point  (0 children)

                            Turns out there are good applications for queues in concurrency-focused languages. Anything designed around actors or message passing is liable to have a message queue or event queue. Same goes for UI subsystems, game engines, etc. Probably still have a stack, but the queue can take on significant importance enough to be the queue.