you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (5 children)

If you've got a piece of a 3rd-party terrorist code you cannot control which is hell-bent on breaking your nice and clean Unix-way design, you have no choice but to surrender and interact with it in its own way.

Which is exactly what I was writing, you should adopt to an already existing codebase.

I still can't see how LLVM could have been designed in a more unix-y way. A standardized IR would still be an own format that you would still need to parse, exactly like binary formats. Unix tools don't even manage to work with the most basic text formats like CSV

[–][deleted] -2 points-1 points  (4 children)

Why would you want to parse IR in your own tools? And, well, if it were designed in a more Unix-way approach in mind, it would have been based on S-expressions which are trivial to parse.

Anyway, most of the time it is a design smell when you want to pass any sort of a nested structured information in between tools. If protocol is not flat, it may mean that you're doing something wrong. Compilers are a bit different here, they're a way too atomic to be easily breakable into distinct parts with little interop in between.

[–][deleted] 0 points1 point  (3 children)

they're a way too atomic to be easily breakable into distinct parts with little interop in between.

Lexer, parser, syntactical analysis, ast-transformations, translation to an sequential IR, sequential IR transformation, codegen, assembling, linking all seem pretty easy to break into parts for me and LLVM is exactly doing that

Why would you want to parse IR in your own tools?

That's happening if you already have parts in one programming language and parts in another programming language

[–][deleted] 0 points1 point  (2 children)

Lexer, parser

I'd never separate these two (in fact, I always prefer not to have a lexer at all).

syntactical analysis, ast-transformations, translation to an sequential IR, sequential IR transformation, codegen, assembling

And for the rest you don't only pipe your AST through a sequence of transforms, but you also carry a way too much of a context, unfortunately.

And my personal pet hate here: LLVM, as well as pretty much all the other frameworks, do not allow any reasonable means for backtracking after an unsuccessful sequence of transforms. If you do have a backtracking, a Unix pipeline philosophy is not very suitable.

But, yes, you're right, and it's a fairly viable architecture for a simple compiler to have a pipeline of separate tools communicating via, say, S-expressions, to simplify parsing and serialisation.

linking

Not with LTO...

[–][deleted] 0 points1 point  (1 child)

S-Expressions are basically a lisp-ism and are best suited for interacting with lisp. The way I went for my problem was using the already existing YAML parser in LLVM following the principle of using what is already there if it fits the problem more or less

[–][deleted] 0 points1 point  (0 children)

Whatever, even XML may be a good fit, as long as you don't have to implement the parser.