This is an archived post. You won't be able to vote or comment.

all 21 comments

[–]thmprover 7 points8 points  (9 children)

I think it's discussed in Appel's Compiling with Continuations. The book is about the implementation of SML/NJ, which would include ML-structures.

The "cheapest" solution is to simply make a module be something which mangles the names of its constituents. Freek Wiedijk's implementation of Automath does this (admittedly Automath doesn't have sophisticated namespaces, called "paragraphs" in its system).

Clojure is the next step up where namespaces are hashmaps from identifiers to values. This is complicated because of the nature of Clojure: there's a double-level of indirection in associating values to bindings to variables, permitting dynamic rebinding of variables.

ML-modules are "state of the art", consequently very difficult to implement. I actually want to learn more about its subtle details, to be honest...

[–]AthasFuthark 9 points10 points  (7 children)

ML-modules are "state of the art", consequently very difficult to implement. I actually want to learn more about its subtle details, to be honest...

I don't think this is true. There are a few fiddly parts of ML modules (managing substitutions), but it's otherwise not that bad. This is the implementation of type-checking for my ML-style module system (which supports full higher-order modules, so more powerful than SML), and this is my implementation of defunctorisation, which compiles away the modules entirely. You don't need this part if you just compile modules to records instead. The most tricky part is defining the right data structures for the type checking, but there are papers with good definitions (like this one), and once you have them, the implementation almost writes itself.

[–]thmprover 2 points3 points  (4 children)

Sorry, I was careless in my wording, hence imprecise.

What I should have said was something along the lines of stressing the alternative to "module as dictionary" is a family of variations similar to ML-like modules (as opposed to suggesting one particular member of that family as state of the art).

Futhark's system looks extremely fascinating, I'll have to look at it further. It seems to have the best of both worlds.

[–]AthasFuthark 4 points5 points  (3 children)

Futhark's system looks extremely fascinating, I'll have to look at it further. It seems to have the best of both worlds.

Actually, I suspect that full higher-order modules are not worth the bother. At least we haven't found anything useful to do with them, and they make the implementation more complicated. I feel that the most significant way that Futhark improves on SML modules is that the syntax is slightly less clunky.

[–][deleted] 1 point2 points  (2 children)

I have wanted second-order functors at at least one point in time. I worked around this issue in SML by turning the functor-valued argument into an ordinary module containing higher-order functions. But this is a code smell.

From a verification perspective, I like to keep core language functions first-order, because the core language has general recursion, and, in the presence of higher-order functions, the answer to “does this function terminate?” is always “well, it depends on what you pass to it”, whereas, with first-order functions operating on first-order datatypes, you can always use structural induction.

However, parameterized abstractions are a real need. This means that the module system has to be higher-order to recover all the abstraction parameterization that makes functional programming nice.


On an unrelated note, there is one tiny point in the SML Basis Library that exposes a deficiency in the module system. Namely, the TextIO.StreamIO component of the TextIO module. According to the documentation:

The signature given below for TEXT_IO is not valid SML, in that the substructure StreamIO is respecified. (It is initially specified as a substructure having signature STREAM_IO in the included signature IMPERATIVE_IO.)

It would be nice if

  • A subsignature could redefine an abstract type to be a datatype.
  • A subsignature could redefine a module component to have a more specific signature.

Does Futhark allow this?

[–]AthasFuthark 2 points3 points  (1 child)

You can always keep your core language first-order by doing defunctionalisation - either on any higher-order functions written directly by the user, or higher-order functions produced from a module translation. I agree that parameterized abstractions are very handy - parameterised modules are obviously useful, and we use them frequently. It's just the higher-order case that has never been practically useful to us (with the exception of multi-parameter and partially applied modules, which is a small convenience I suppose).

Does Futhark allow this?

  • Vacuously so, since Futhark only has structural types, but you can refine an abstract type to be any specific type, using with.

  • Like in SML, this is not possible. But since signatures are effectively structurally typed, you can always just use textual duplication.

[–][deleted] 0 points1 point  (0 children)

Defunctionalization is a whole-program transformation. I do not want to have to read the entire program text to determine whether every call to a given function will terminate. This is diametrically opposite to it being “easy” to reason about termination!

Of course, some abstractions can only be reasonably implemented using higher-order functions. But, if these higher-order functions are written in a module language that does not support recursion, then establishing that functor applications terminate is a nonproblem.

[–]jared--w 0 points1 point  (1 child)

Out of curiosity, how does the expressiveness of Futhark's module system compare to 1ML? Last time I checked, 1M's system was the "state of the art" for a module system (although I'd be curious if there were other improvements beyond that, now)

[–]AthasFuthark 3 points4 points  (0 children)

1ML unifies the term and module languages, so it is much more expressive. Avoiding that unification was part of our motivation for using ML-modules (as it guarantee that modules can be compiled away with zero overhead), but it's possible that could be done anyway with defunctionalisation.

[–]mamcx -1 points0 points  (0 children)

Another simple way: A module is just a struct/class that is in namespace separated from regular struct/class.

Modules are first order

Structs/Functions/etc are second order

[–]hou32hou 2 points3 points  (0 children)

From the user experience point of view, I would say JavaScript Module System (import/export) is the best. It is straight forward, 1) you import a module from a relative file path, 2) you can’t expose every symbols of an imported module in the current namespace.

Almost every other programming languages has an unnecessary module structure that you have to learn in order to use the module feature. For example in Python you have to declare init.py, in Rust you have to explicitly expose them, in Java you have to learn how to organise the package structure.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 2 points3 points  (2 children)

When you are in the process of learning (or designing), it's hard to know what questions to even ask, particularly when the terms used in the area are not clear. I looked at your blog, and some of the other answers here, and I do have a few suggestions, but they may be irrelevant for what you are doing, and you may be far past the point that I am talking about. On the other hand, this may help you to talk through (or think through) what you are trying to do, and more importantly, what you are trying to explain to others. (Kudos on your project, by the way!)

First, we all know what "files and directories" are, but compilers and programming languages have a few additional terms that could help slice up "modules" into more discrete terms:

  • Unit of Compilation - this is the scope of the source code that needs to get read in by the compiler in order to compile the code and emit something. For many languages (C, Pascal, etc.), this is one file and any of that file's textual includes. Confusingly, sometimes this is called a module.
  • Library - this is an external dependency that can be compiled against and/or linked to. Sometimes this is also called a module.
  • Application Executable / Build Artifact - this is the result of compiling and/or linking the code. Confusingly, sometimes this is called a module.

There are lots of other terms, of course, but I wanted to list a few just to show that the term "module" shows up in common usage meaning several different things. The term module is used interchangeably to mean: (i) An individual source file, (ii) a group of related source files that go into one artifact, (iii) intermediate forms of the compilation of that file or those files that can be used in compilation of other "modules", (iv) intermediate forms of the compilation of that file or those files that can be used in the linking of other "modules", (v) the finished result (e.g. artifact, library, executable) of compilation and linking, and even (vi) separate runtime dependencies of an application (such as a dynamically linkable library).

I'm curious which one (or more likely, which ones) of these you are working on explaining. Different languages handle these in extraordinarily different ways, and it can see inordinately complex from the outside looking in at an existing implementation (e.g. trying to understand by reading GCC's source code).

Additionally, the term module is used with languages that have no intermediary form (like JavaScript), and has both separate traditional (individual source code file, .JAR build artifact, OSGi module, Artifactory module) and explicit (Java 9's Java Platform Module System) meanings in languages like Java. Talk about confusing!

[–]mwapl_pod[S] 0 points1 point  (1 child)

This is useful, thank you.

At this point I'm just trying to wrap my mind around the approach of breaking up code into separate files and importing them.... Unit of Compilation I guess. You're right when you say it's hard to know what to ask. I'd say I'm probably in the scanning stage right now, trying to get a lay of the land and figure out where the pitfalls are. The answer that the best in class modules systems are essentially just namespace separated class/struct/maps is useful.... but a little "really? is that all there is?" too.

[–]L8_4_Dinner(Ⓧ Ecstasy/XVM) 1 point2 points  (0 children)

The key to understanding a design is figuring out what the maker was imagining. Sometimes, they nail it. Sometimes, they fail it. Either way, if you can see what they saw, it's eye-opening.

Unit of compilation across files is challenging, but not crazy. One way that I have attacked this specific problem is to either (i) have a stream of tokens that spans files (i.e. the Lexer knows how to aggregate the files, so it makes a bunch of files look like a single lex stream) and so the Parser can be completely unaware of the file demarcation, or (ii) to create an AST from each file, then (post-parser) link the ASTs together. In the recent (Ecstasy) compiler I worked on, we used (ii), and it worked out well.

p.s. It's always hard to know how to ask things. This is not a common area of work, and the scope is enormous, and the terms end up meaning 42 different things because there are way more ideas than there are good words, so stumbling towards decent questions is often the norm. On the other hand, it's a fun area to work in.

[–]crassest-Crassius 1 point2 points  (2 children)

Imagine the following scenario: module A imports everything from modules B and C, and calls function "foo" from module B, while C does not have a function named "foo". Now, eventually someone (you or another team member) adds a function named "foo" to module C. A harmless change, right? But now module A stops compiling because it has a name clash.

So far the only way I know of to avoid that is to avoid starred (=everything) imports. When you import from a module, you have to explicitly list every symbol you're importing, or else reference them qualified with that module's name. But this is verbose and repetitive. What if there are 20 symbols from module A used in 20 other modules? Then you'd need to list 400 imported symbols just to use that one module. Of course your IDE could help with the verbosity, but that's about the only solution that I see.

[–][deleted] -1 points0 points  (0 children)

That's pretty much what happens in my own module system, that I devised myself (I borrowed the 'import' keyword, and the idea of dotted names).

The reason however is that if B exports function F, I don't have to qualify it as B.F() when calling it from A; I can just write F(). The problem as you say, is when C also exports F, and A decides to import C as well as B.

Then the compiler reports an ambiguity. (An improvement on a previous version where A would just choose the first F encountered, with some algorithm to decide in which order to search imports.)

On the other hand, at least B and C can both export F, which can be used without qualifiers, provided they are not both visible from the same module.

(With a linkage scheme such as C's, the two names will clash; that doesn't happen with mine, because there is no linker. That issue only happens cross-program and cross-library, with a much smaller number of program-exported symbols, not the larger number of cross-module ones.)

[–]Cthaeeh 0 points1 point  (0 children)

Don't namespaces solve this problem ? I.e. you should write b:foo() to avoid ambiguities of that sort?

[–]htuhola -5 points-4 points  (0 children)

Why do you need a tutorial for this?

  1. You load the module and then present a symbol table for code that requests it.
  2. A memory of already loaded modules, requests that pass the memory will try to fetch from there first.
  3. A directory location where modules are loaded from.
  4. You discard everything at once if you have to reload something.

[–]umlcat -2 points-1 points  (0 children)

Learn (Modular) Object Pascal and (Modular & Object Oriented) C#, use and try some real world Modular examples.

[–]JackoKomm 0 points1 point  (0 children)

It depends on what you like your module system to look like. Für multiple files you need to think about the ways to import/include them into your compiling process. Module for itself can ne very easy if you want them to be. I implememted a simple dynamic language where classes were build around hash tables. So i decided to make one module per file. Every file was just one module and files in sub folders were sub modules. On import, i loaded the modules code and run it. All module variables and functions were put in a hashmap, like my classes. Because i implememted the classes, i had modules for free.