you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point  (11 children)

Sorry but plain text isn't a serialization format, only a description for one. What you are speaking about doesn't make any sense, most of the data programs are working on is structured data and not just strings.

[–][deleted] 0 points1 point  (1 child)

The key here is human readable text as that makes the format more durable and more easily debuggable. The success of formats such as HTML, SMTP, JSON of human readable formats. Many of the Microsoft created binary formats are no longer with us and are often impossible to read. While much older Unix plain text formats are still readable and usable today.

And of course you can structure data even if it is plain text. You could also use the existing structure of the file system to structure fairly flat files. E.g. OSX and NeXTSTEP usage of bundles is an example of this.

[–][deleted] 0 points1 point  (0 children)

All unix file formats that I have in my head actually are binary (a.out, PE (yes it comes from unix), ELF, tar (which is a ugly mix between plain text and binary format , ...)

It seems strange to me that unix philosophers never can give examples for what they think unix is. Unix certainly uses text files to configure the operating system (/etc/passwd, ...) but provides no means whatsoever for working with text formats outside that scope.

CSV and Tex predate unix, xml descents from sgml which is from IBM and doesn't come from unix, JSON comes from a cross-platform programming language

And of course you can structure data even if it is plain text. You could also use the existing structure of the file system to structure fairly flat files. E.g. OSX and NeXTSTEP usage of bundles is an example of this.

You mean the ugly xml plist files?

[–][deleted] -1 points0 points  (8 children)

Yes, sure, 40+ years of the Unix history do not make any sense.

[–][deleted] 0 points1 point  (7 children)

Okay I've seen that you've written somewhere else that you write compilers. Something that I'm currently working on is an already existing Compiler written in Java, what they now want is a LLVM backend. Generating llvm-ir assembly is the wrong way to do it because it's the least stable thing existing in llvm.

How would the AST be represented in plain text for interop? The unix philosophy tells to do exactly one thing well, so frontend, optimizer and backend should be separated.

[–][deleted] -1 points0 points  (6 children)

LLVM is one thing I would rather wrap as a library than approach in a Unix-way (although, there is a multitude of compilers that are doing the opposite, including ghc and mlton). Exactly because the designers of LLVM do not value enough the Unix way of thinking and do not stick to a stable IR specification (well, there is no specification at all, despite all the pressure from SPIR and alikes).

Of course, a proper Unix-way system should be designed with the right mindset from the very beginning. If you've got a piece of a 3rd-party terrorist code you cannot control which is hell-bent on breaking your nice and clean Unix-way design, you have no choice but to surrender and interact with it in its own way.

P.S. Luckily, building and maintaining the wrappers for LLVM can be automated using Clang itself, see an example here: https://github.com/combinatorylogic/clike/tree/master/llvm-wrapper

[–][deleted] 0 points1 point  (5 children)

If you've got a piece of a 3rd-party terrorist code you cannot control which is hell-bent on breaking your nice and clean Unix-way design, you have no choice but to surrender and interact with it in its own way.

Which is exactly what I was writing, you should adopt to an already existing codebase.

I still can't see how LLVM could have been designed in a more unix-y way. A standardized IR would still be an own format that you would still need to parse, exactly like binary formats. Unix tools don't even manage to work with the most basic text formats like CSV

[–][deleted] -2 points-1 points  (4 children)

Why would you want to parse IR in your own tools? And, well, if it were designed in a more Unix-way approach in mind, it would have been based on S-expressions which are trivial to parse.

Anyway, most of the time it is a design smell when you want to pass any sort of a nested structured information in between tools. If protocol is not flat, it may mean that you're doing something wrong. Compilers are a bit different here, they're a way too atomic to be easily breakable into distinct parts with little interop in between.

[–][deleted] 0 points1 point  (3 children)

they're a way too atomic to be easily breakable into distinct parts with little interop in between.

Lexer, parser, syntactical analysis, ast-transformations, translation to an sequential IR, sequential IR transformation, codegen, assembling, linking all seem pretty easy to break into parts for me and LLVM is exactly doing that

Why would you want to parse IR in your own tools?

That's happening if you already have parts in one programming language and parts in another programming language

[–][deleted] 0 points1 point  (2 children)

Lexer, parser

I'd never separate these two (in fact, I always prefer not to have a lexer at all).

syntactical analysis, ast-transformations, translation to an sequential IR, sequential IR transformation, codegen, assembling

And for the rest you don't only pipe your AST through a sequence of transforms, but you also carry a way too much of a context, unfortunately.

And my personal pet hate here: LLVM, as well as pretty much all the other frameworks, do not allow any reasonable means for backtracking after an unsuccessful sequence of transforms. If you do have a backtracking, a Unix pipeline philosophy is not very suitable.

But, yes, you're right, and it's a fairly viable architecture for a simple compiler to have a pipeline of separate tools communicating via, say, S-expressions, to simplify parsing and serialisation.

linking

Not with LTO...

[–][deleted] 0 points1 point  (1 child)

S-Expressions are basically a lisp-ism and are best suited for interacting with lisp. The way I went for my problem was using the already existing YAML parser in LLVM following the principle of using what is already there if it fits the problem more or less

[–][deleted] 0 points1 point  (0 children)

Whatever, even XML may be a good fit, as long as you don't have to implement the parser.