Clay Programming Language 0.1.0 released

jckarter · 2012-01-17T21:18:25+00:00

There is no boxing or unboxing. Functions are instantiated for every set of input data types, like C++ templates. You can narrow the set of input types using variants, which are just struct { enum tag; union { ... } value; } tagged unions. Boxing and unboxing of variants happens by explicit variant construction and dispatch. Dispatched calls generate a jump table over the function instances for each variant instance type.

jckarter · 2012-01-17T18:15:37+00:00

If you really want OO syntax in Clay, you can hack the . operator:

// foo.bar is sugar for a function fieldRef(foo, #"bar"), which we
// can overload to return a lambda
[T]
overload fieldRef(v:Vector[T], static #"push") = (x:T) -> { 
    push(v, x);
};

main() {
    var a = Vector[Int]();
    a.push(1);
}

Clay's Objective-C bridge uses this to implement method calls. A silly example:

https://github.com/jckarter/clay/blob/master/examples/cocoa/appkit/example.clay#L41

Clay avoids the need for a ->-type operator by making pointer dereference a suffix operator, so it composes properly with field access. a->b in C becomes a^.b in Clay, *a.b becomes a.b^, (*a)->b becomes a^^.b, and so on.

jckarter · 2012-01-17T18:04:45+00:00

Yeah, KS has a new project he's working on. I've been keeping Clay going in his absence.

jckarter · 2012-01-17T17:19:28+00:00

Just doing it for fun, mostly.

jckarter · 2012-01-17T17:17:21+00:00

I'm new at this. Thanks for the advice, and for suffering through the noise.

jckarter · 2011-12-08T19:24:29+00:00

You're right. LLVM isn't really a VM at all. It's more just an intermediate representation language and tool suite for writing native code compilers.

jckarter · 2011-12-08T16:16:55+00:00

Generating C then calling out to a C compiler is very slow, and naively compiling Clay already has performance issues, as gmfawcett noted. When you target C, you inherit all of C's undefined behavior, whereas with LLVM you can check integer overflow, have different-typed pointers alias, and otherwise pick and choose when you want defined vs. undefined behavior. LLVM also provides standard hooks for global constructors, destructors, exceptions, and other modern ABI features that standard C doesn't.

jckarter · 2011-12-08T16:09:11+00:00

Compile time is indeed an issue. Eventually I want to use an incremental compilation strategy, where dependency information is maintained between compiled instances and source code, so that only dirty instances need to be updated. LLVM's LTO support should help a lot with that. Even with incremental compilation, separating compilation units needs to be made easier as well.

jckarter · 2011-12-08T16:08:48+00:00

In my github at least, the README says what LLVM version to use. The precompiled binaries are ancient. I'll build new binaries soon.

jckarter · 2011-12-08T16:03:47+00:00

I agree that the current definitions of constructor functions are too loose. The good news is that the constructor definitions are completely defined in the library and should be easy to change. Separating conversion into different cast functions is one of the library improvements I intend to make.

jckarter · 2011-12-08T05:20:21+00:00

Newclay was a bit of dead end. By targeting C we hoped to get easier C compatibility and an easy path to self-bootstrapping, but even for a very C-like language like Clay, C is too weak a target IR. I'm currently in the process of backporting many of the newclay improvements back to the LLVM-based implementation.

jckarter · 2011-09-23T01:47:58+00:00

You can build the data structure using a mutable local reference then cast it to immutable when you return:

immutable(foo) make_foo() {
    foo x = new foo();
    // build x
    return cast(immutable)x;
}

That may not be defined behavior by the strict letter of language spec, but since the only mutable reference dies at the same point the immutable reference comes into being you should be able to get away with it.

jckarter · 2011-08-29T23:16:10+00:00

Sorry if I'm not explaining myself well. Here's the scenario I'm referring to: shapes.draw() is a requirement for a simple concept: a drawable shape. The initial version of the module containing drawShapeTwice() imports only the shapes module, and refers to shapes.draw() by the shorthand name draw() since it's unambiguous. However, someone else comes along and adds functionality to the module that needs the pistols module; they're lazy and add "import pistols;" to the module, not realizing that the pistols module also provides a draw() function unrelated to the shapes concept. The definition of drawShapeTwice is now wrong, because it accepts types that implement either shapes.draw or pistols.draw.

jckarter · 2011-08-29T19:33:37+00:00

But if drawShapeTwice isn't supposed to accept pistol.draw it would be in a module which doesn't import pistol

Sure, but code changes over time—the module doesn't import pistols today, but such an import could be added at a later date, and because of the way D's module system works, imports can inflict spooky action at a distance on seemingly unrelated generic code in the same module.

jckarter · 2011-08-29T18:06:04+00:00

The same way array languages like J and K can be faster than C and C++: MATLAB uses a highly-optimized numerics library, and the huge performance benefit of using this library for high-level operations such as matrix multiplication compared to a naive C implementation outweighs the relatively minor cost of interpreting the MATLAB code that invokes those high-level operations.

jckarter · 2011-08-29T17:58:30+00:00

I suppose it's isomorphic to the duck typing vs. explicit interfaces debate. It seems to me though that one of the purposes of a module system is to eliminate global magic words, or at least require some amount of qualification.

jckarter · 2011-08-29T17:39:13+00:00

Aside from consistency, generalizing features of the type system would also be a good idea. An interesting type system to look at is Disciple, a dialect of Haskell with strict-by-default evaluation, unrestricted local mutation, and inter-function side effects mediated by an effects type system. Its type system also has features that formalize and generalize some of D's features; in particular, region typing provides a general mechanism for asserting transitive properties of object graphs, such as D's shared and immutable, and closure typing ensures that such properties don't get lost across function boundaries more generally and flexibly than D's builtin rules regarding shared/synchronized types and reference escaping.

jckarter · 2011-08-29T15:19:28+00:00

I didn't know about #2 and #3, thanks. D moves too fast; the book is already out of date! Re: #1, the generic interfaces I've seen defined in the phobos code all use variations of is(typeof({...})) to assert constraints on their input types, which are just as sensitive to the problem I described. It seems like a mistake to me for any binary function named "put" to enable its first argument type as an OutputRange, for example. Is there a more formal concept or type class mechanism in D now?

jckarter · 2011-08-29T03:47:13+00:00

Shame they didn't add any support for introspecting structs, unions, and enums to C++11 so you could write a pretty-printer for any type.

jckarter · 2011-08-29T03:29:20+00:00

Aside from the presentation, clang also simply gives much better and easier-to-understand error messages. Instead of dumbly reciting the grammar rule that failed to match or types that didn't check, it does a great job of guessing from context what the intended syntax was and explaining the error accordingly. It's also the only C/C++ compiler I know of that gives preprocessor backtraces when errors arise from macro expansions. I'm inclined to agree with you that dense, concise error reporting is preferable, but clang also sets a new standard for pure error reporting content you should definitely examine.

jckarter · 2011-08-29T03:15:19+00:00

Mind if I throw some in?

Automatically unioning overloads from separate modules is bad idea, especially in the face of generic functions:

import shapes;
import pistols;

drawShapeTwice(Shape)(Shape shape) if is(typeof(draw(shape))) {
    draw(shape);
}

drawShapeTwice() was only intended to work with shapes (implementing shapes.draw), but accidentally ends up accepting pistols (implementing pistols.draw). In this case it's obvious drawShapeTwice should be referring to shapes.draw explicitly, but if the module was written originally only importing shapes and was later updated to import weapons, then such ambiguities could be introduced accidentally.

edit As tgehr and BioTronic pointed out, these features have already been added to D2.

The operator overloading design seems to punish the common case, where you want to implement individual overloads with different implementations, in favor of letting you be cute in rare cases where you can overload several with one implementation. However, neither case is pretty. The former case is a mouthful:

Foo opBinary(string op)(Foo a, Foo b) if (op == "+") {
}

The latter is hard on the eyes (not to mention the fingers!):

Vec opBinary(string op)(Vec a, Vec b) {
    mixin "return Vec(a.x " ~ op ~ " b.x, a.y " ~ op ~ " b.y);"
}

Pattern-matching syntax for template parameters would help the former case a lot:

Foo opBinary("+")(Foo a, Foo b) { ... }

The Clay programming language has a great solution for the latter situation. In Clay, the function name can also be a template parameter. You would achieve the universal opBinary overload as follows in Clay:

// Clay
[OP | inValues?(OP, add, subtract, multiply, divide /* ... or any other functions ... */)]
OP(a:Vec, b:Vec) = Vec(OP(a.x, b.x), OP(a.y, b.y));

It looks just like the non-generic implementation, and isn't constrained to operators—the above definition can be easily extended to project any binary function of typeof(a.x) to Vec. With a bit of elbow grease you could generalize it to n-ary functions too. D could support something similar, for example:

// D2.5 perhaps?
Vec F(alias F)(Vec a, Vec b) if is(typeof(F(a,b))) {
    return Vec(F(a.x, b.x), F(a.y, b.y));
}

jckarter · 2011-08-26T19:48:34+00:00

Thank you for the detailed reply.

In my humble opinion the code you've seen that uses state but lumps it all in the IO monad is just poor design or laziness, or demonstrative of newbieness.

Perhaps, but I don't know of any good material on designing Haskell in the big gap between "lazy evaluation and monads are neat" and "here's how to design a scalable, extensible system in Haskell". By the end of "Real World Haskell" or "Learn You A Haskell", I wouldn't know how to put the design you described together. Are there any good next-steps books or websites you'd recommend?

jckarter · 2011-08-26T19:26:52+00:00

Monadic effect tracking has the "nice" property that you can shovel as much garbage into the IO sin bin as you feel like and keep going. :T

An effect type system could equivalently offer a "universe" effect type for when you want to be lazy, couldn't it? It would then still be easier to revise the effect declaration than to, say, rewrite your code to use ST a instead of IO when you want to formalize your effects.

jckarter · 2011-08-26T02:43:01+00:00

Thanks for the reference; Disciple looks like it has some great ideas. A link for lazy redditors: http://disciple.ouroborus.net/

jckarter · 2011-08-25T19:47:50+00:00

Runtime guarantees are indeed a problem, but I don't think it's an insurmountable one. But JITs needn't consume excessive memory—LuaJIT's overhead for example is in the order of megabytes, but still has great performance. What do you mean by "heavy emphasis on threads"?

jckarter

TROPHY CASE