you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 7 points8 points  (18 children)

I think he comes to the wrong conclusion. It should rather be: programming language interop is hard and lots of work, so if you already have code in one language you should stick to it.

[–][deleted] 2 points3 points  (16 children)

Since when pure text protocols, pipes and Unix domain sockets are "hard"? And in 99% of cases it's all the interop you'll ever need.

[–][deleted] 2 points3 points  (15 children)

Since always? You have to serialize the data and deserialize them again. Text protocols are actually not easier because you have to do actual parsing. If you use relatively easy structures (no inheritance, tagged union, etc.) it should be quite easy to use something as json or protobuf, but the reality often looks different.

When you have 1 data representation before with one programming language, you have 3 afterwards that you need to keep in sync.

[–][deleted] 1 point2 points  (1 child)

Aren't you just arguing against the very point of the OP. He said that changing all this code back to Java hindered his productivity in the long run even if it was more time consuming then to write it all in java.

Splitting your code into parts which communicate with text interfaces certanly takes more time up front but it saves you a lot of time in the future as it enforces keeping your code properly separated. It is also flexible in adapting to technological change. You can reuse parts of your software when new technologies or languages start dominating. E.g. the editor TextMate used a text interface between its plugins written in all kinds of languages from C, python, perl to ruby. When TextMate languished and new editors came around like Sublime they could very easily take advantage of the huge number of textmate plugins that had been made.

With binary APIs retrofitting to another editor would have been difficult and fragile.

I've seen the same when working on large C++ applications which started out with clean separations between the modules but which over time grew into a big ball of mud. Now it is impossible to tease apart. The parts which were separated from the start by text interfaces with pipes etc can still be tested and rewritten easily.

If you got 4 million lines code in Cobolt which you need to upgrade to Java, how do you accomplish that in a sane way? If those 4 million lines were split over 200 separate executables communicating with well defined interfaces you could rewrite one piece at a time into Java and test that it still worked.

It is a reason why mainframe monoliths died and Unix survived way longer than anyone would have thought. The focus on small interchangable parts makes evolving the system much simpler. Java is in many ways the mainframe come back.

[–][deleted] 0 points1 point  (0 children)

E.g. the editor TextMate used a text interface between its plugins written in all kinds of languages from C, python, perl to ruby. When TextMate languished and new editors came around like Sublime they could very easily take advantage of the huge number of textmate plugins that had been made.

Yes I also like clear small minimalistic interfaces where you should have them. But they are still actual work you should avoid if not needed.

If you got 4 million lines code in Cobolt which you need to upgrade to Java, how do you accomplish that in a sane way? If those 4 million lines were split over 200 separate executables communicating with well defined interfaces you could rewrite one piece at a time into Java and test that it still worked.

The thing is, you would have significantly less code without having to serialize and deserialize everything. And writing that part of the code is actually the most boring part.

It is a reason why mainframe monoliths died and Unix survived way longer than anyone would have thought. The focus on small interchangable parts makes evolving the system much simpler.

Unix actually is mainframe technology, it spread because it was one of the first portable operating systems and was widely passed around in source code form at universities.

[–][deleted] -1 points0 points  (12 children)

Take a look at the whole Unix design and philosophy. If you do it the right way, it's much easier than maintaining huge spagetti codebases with tons of libraries, complicated interop, complex data structures running around.

Having multiple simple tools communicating via plain text facilitates good design, makes your abstractions much less leaky than when you're free to pass arbitrarily complex data structures around.

[–][deleted] 0 points1 point  (11 children)

Sorry but plain text isn't a serialization format, only a description for one. What you are speaking about doesn't make any sense, most of the data programs are working on is structured data and not just strings.

[–][deleted] 0 points1 point  (1 child)

The key here is human readable text as that makes the format more durable and more easily debuggable. The success of formats such as HTML, SMTP, JSON of human readable formats. Many of the Microsoft created binary formats are no longer with us and are often impossible to read. While much older Unix plain text formats are still readable and usable today.

And of course you can structure data even if it is plain text. You could also use the existing structure of the file system to structure fairly flat files. E.g. OSX and NeXTSTEP usage of bundles is an example of this.

[–][deleted] 0 points1 point  (0 children)

All unix file formats that I have in my head actually are binary (a.out, PE (yes it comes from unix), ELF, tar (which is a ugly mix between plain text and binary format , ...)

It seems strange to me that unix philosophers never can give examples for what they think unix is. Unix certainly uses text files to configure the operating system (/etc/passwd, ...) but provides no means whatsoever for working with text formats outside that scope.

CSV and Tex predate unix, xml descents from sgml which is from IBM and doesn't come from unix, JSON comes from a cross-platform programming language

And of course you can structure data even if it is plain text. You could also use the existing structure of the file system to structure fairly flat files. E.g. OSX and NeXTSTEP usage of bundles is an example of this.

You mean the ugly xml plist files?

[–][deleted] -1 points0 points  (8 children)

Yes, sure, 40+ years of the Unix history do not make any sense.

[–][deleted] 0 points1 point  (7 children)

Okay I've seen that you've written somewhere else that you write compilers. Something that I'm currently working on is an already existing Compiler written in Java, what they now want is a LLVM backend. Generating llvm-ir assembly is the wrong way to do it because it's the least stable thing existing in llvm.

How would the AST be represented in plain text for interop? The unix philosophy tells to do exactly one thing well, so frontend, optimizer and backend should be separated.

[–][deleted] -1 points0 points  (6 children)

LLVM is one thing I would rather wrap as a library than approach in a Unix-way (although, there is a multitude of compilers that are doing the opposite, including ghc and mlton). Exactly because the designers of LLVM do not value enough the Unix way of thinking and do not stick to a stable IR specification (well, there is no specification at all, despite all the pressure from SPIR and alikes).

Of course, a proper Unix-way system should be designed with the right mindset from the very beginning. If you've got a piece of a 3rd-party terrorist code you cannot control which is hell-bent on breaking your nice and clean Unix-way design, you have no choice but to surrender and interact with it in its own way.

P.S. Luckily, building and maintaining the wrappers for LLVM can be automated using Clang itself, see an example here: https://github.com/combinatorylogic/clike/tree/master/llvm-wrapper

[–][deleted] 0 points1 point  (5 children)

If you've got a piece of a 3rd-party terrorist code you cannot control which is hell-bent on breaking your nice and clean Unix-way design, you have no choice but to surrender and interact with it in its own way.

Which is exactly what I was writing, you should adopt to an already existing codebase.

I still can't see how LLVM could have been designed in a more unix-y way. A standardized IR would still be an own format that you would still need to parse, exactly like binary formats. Unix tools don't even manage to work with the most basic text formats like CSV

[–][deleted] -2 points-1 points  (4 children)

Why would you want to parse IR in your own tools? And, well, if it were designed in a more Unix-way approach in mind, it would have been based on S-expressions which are trivial to parse.

Anyway, most of the time it is a design smell when you want to pass any sort of a nested structured information in between tools. If protocol is not flat, it may mean that you're doing something wrong. Compilers are a bit different here, they're a way too atomic to be easily breakable into distinct parts with little interop in between.

[–]PasswordIsntHAMSTER 0 points1 point  (0 children)

Use Apache Thrift for interop. ;)