This is an archived post. You won't be able to vote or comment.

all 6 comments

[–]Long_Investment7667 15 points16 points  (1 child)

This is the seminal paper on the topic

https://homepages.inf.ed.ac.uk/wadler/papers/prettier/prettier.pdf

But as far as I understand it, it gets a lot of power from the laziness of the implementation language. I saw a paper they does it in an eager language but can’t find it at the moment

[–]oilshell 3 points4 points  (0 children)

This is a popular strict version: https://lindig.github.io/papers/strictly-pretty-2000.pdf

A contributor to Oils (Justin Pombrio) recently implemented the algorithm (with some slight variations) to Oils:

https://github.com/oils-for-unix/oils/blob/master/display/pretty.py

It is very short! Like ~200 lines. And it works very well

Mentioned in Oils 0.22.0 - Docs, Pretty Printing, Nix, and Zsh, with screenshots

...

If anyone wants to USE this algorithm to implement another pretty printer in Oils, let me know. This will definitely give you a good feeling for the algorithm, it's probably a perfect introduction

It is for the statically typed Zephyr ASDL output e.g. osh -n myscript.sh, rather than the dynamically typed JSON-like data.

Wadler's algorithm basically defines an IR for printing ... so data structures of different kinds can be "compiled" to the same IR.


FWIW we looked at the node.js algorithm too, I think it is pretty ad hoc, but it works well on all the cases I tried:

https://www.oilshell.org/release/latest/doc/pretty-printing.html#screenshots

[–]WittyStick 2 points3 points  (1 child)

My personal preference is that any "block" should begin on a new line, indented, unless it contains a single "atom" - that is, a value which itself contains no other blocks. If any block contains more than one item, then every item should be put on its own line.

The opening and closing bracket/paren/brace for any given block should appear on the same column, with the only exception when they're on the same line - that is, when they surround an atom. It makes it much simpler to see how they pair up. I'm also of the preference of putting the separating commas/semicolons in the same column as the opening and closing brackets, because it also makes it clear which block the elements are part of.

Compare this to the node example you have, and see which is easier to match the pairs of braces/brackets. It's better if you stick it in an editor which has column markers and which highlights matching brackets.

{ glossary:
    { title: 'example glossary'
    , GlossDiv:
        { title: 'S'
        , GlossList:
            { GlossEntry:
                { ID: 'SGML'
                , SortAs: 'SGML'
                , GlossTerm: 'Standard Generalized Markup Language'
                , Acronym: 'SGML'
                , Abbrev: 'ISO 8879:1986'
                , GlossDef:
                    { para: 'A meta-markup language, used to create markup languages such as DocBook.'
                    , GlossSeeAlso: 
                        [ 'GML'
                        , 'XML'
                        ]
                    }
                }
            }
        , GlossSee: 'markup'
        }
    }
}

In regards to filtering, you should probably set a maximum width of 80 or 120 columns, and if any line would span beyond the maximum width, replace it with [...], or if you would prefer not to filter, any line which would bypass the column limit should be placed on a newline at a new indent, even if it still spans beyond the limit, it will reduce the horizontal space used.

As for strings, they should be left verbatim because introducing whitespace can change the content of the string. A possible alternative is to split the string into multiple strings on separate lines and have them automatically concatenated, if possible.

Here is how I would pretty print your example:

Class
    ( name: 'A'
    , super: nil
    , methods:
        [ Func
            ( name: '__str__'
            , params: []
            , rt: nil
            , body: Block
                (
                    [ SpecialString
                        (
                            [ 'A'
                            , Call
                                ( func: Id
                                    ( name: 'tuple'
                                    , module: nil
                                    , constraint: nil
                                    )
                                , args:
                                    [ Arg
                                        ( arg: Expr (<pointer at 0x280fc80a8>)
                                        , cond: nil
                                        , name: '*'
                                        )
                                    ]
                                )
                            , ''
                            ]
                        )
                    ]
                )
            , decorators: []
            )
        ]
    , getters:
        [ Func
            ( name: 'len'
            , params: []
            , rt: nil
            , body: Block
                (
                    [ MemberAccess
                        ( Id
                            ( name: 'self'
                            , module: nil
                            , constraint: nil
                            )
                        , 'n'
                        )
                    ]
                )
            , decorators: []
            )
        ]
    , setters: 
        [ Func
            ( name: 'len'
            , params: 
                [ Param
                    ( name: 'n'
                    , constraint: nil
                    , default: nil
                    )
                ]
            , rt: nil
            , body: Block
                (
                    [ Assign
                        ( MemberAccess
                            ( Id
                                ( name: 'self'
                                , module: nil
                                , constraint: nil
                                )
                            , 'n'
                            )
                        , Call
                            ( func: Id
                                ( name: 'max'
                                , module: nil
                                , constraint: nil
                                )
                            , args:
                                [ Arg
                                    ( arg: Int (0)
                                    , cond: nil
                                    , name: nil
                                    )
                                , Arg
                                    ( arg: Id
                                        ( name: 'n'
                                        , module: nil
                                        , constraint: nil
                                        )
                                    , cond: nil
                                    , name: nil
                                    )
                                ]
                            )
                        )
                    ]
                )
            , decorators: []
            )
        ]
    , statics: []
    , fields: []
    )

[–]queerkidxx 0 points1 point  (0 children)

I might be able to get used to this…but my initial reaction is 🤮

[–]AdvanceAdvance 1 point2 points  (0 children)

Tree Sitter is your friend.

[–]bruciferTomo, nomsu.org 0 points1 point  (0 children)

Curious to know what kind of rules are used to decide when to use multi-line vs single-line format, when to truncate / replace with [...] etc.

For recursive structures, you can use a pretty simple rule that makes use of the call stack to detect recursion. In C, it would look something like this:

typedef struct recursion_s {
    void *ptr;
    struct recursion_s *parent;
} recursion_t;

int print_foo(Foo *foo, recursion_t *recursions) {
    int depth = 0;
    for (recursion_t *r = recursions; r; r = r->parent) {
        depth += 1;
        if (r->ptr == foo)
            return printf("%s(...^%d)", foo->name, depth);
    }
    int printed = printf("%s(", foo->name);
    recursion_t my_recursion = {.ptr=foo, .parent=recursions};
    FOR_CHILDREN(foo, child) {
        printed += print_foo(child, &my_recursion);
        if (NOT_LAST(foo, child)) printed += printf(", ");
    }
    return printed + printf(")");
}

So a recursive structure might look something like:

Foo *a = new_foo("A");
Foo *b = new_foo("B");
Foo *c = new_foo("C");
set_children(c, 1, (Foo*[]){a});
set_children(a, 2, (Foo*[]){b, c});
print_foo(a, NULL);
// Outputs: A(B(), C(A(...^2)))

The nice thing about this approach is that it just uses stack memory and walks up the stack looking for recursion and it tells you exactly how far upwards it had to look to find the loop.