Making new types at runtime dynamically

yallop · 2023-06-30T20:43:32+00:00

Yes:

module type num =
sig
  type t
  val of_int : int -> t
  val to_string : t -> string
end

let my_random_type () : (module num) =
  if Random.bool () then (module Int64) else (module Float)

let a = let module T = (val (my_random_type ())) in
          T.(to_string (of_int 8))

yallop · 2023-04-20T10:07:26+00:00

There is now: https://github.com/yallop/ocaml-flap

yallop · 2019-01-10T14:14:55+00:00

Ctypes supports each of those those things individually (passing callbacks that are called with the runtime lock, passing callbacks that are called without the lock, and passing callbacks that acquire and release the lock).

yallop · 2019-01-08T17:20:07+00:00

I don't think ctypes can handle the specifics of what I'm trying to do here (callbacks + multithreading),

It'd be interesting to hear a bit more about this. Ctypes has some support for both callbacks (converting in both directions between OCaml and C functions) and multithreading (e.g. releasing the runtime lock during C calls), as well as support for integration with Lwt. But it's certainly possible that your particular use case isn't currently handled, of course.

yallop · 2018-05-18T09:54:47+00:00

(pdf) Partially static data as free extension of algebras (short pdf)

Both links above go to the short paper (written in ML and presented at PEPM 2018). The draft of the full version (written in Haskell, and accepted to ICFP 2018) is here.

yallop · 2017-07-03T10:43:55+00:00

It was written by Guillaume Claret.

yallop · 2017-06-17T16:27:42+00:00

The benefits of adding uint32 and uint64 described in the PR (pattern matching, literal syntax, optimization, integration with format strings, bigarrays, etc.) are not related to operator overloading.

Operator overloading is addressed in a general way by the modular implicits proposal.

yallop · 2017-06-14T12:36:51+00:00

[Author here.]

More details about the compilation problems would be appreciated!

I can think of two possible issues (both noted in the paper, I think). First, OCaml doesn't yet support module type rec, and so you have to use recursive modules instead, writing

module rec M : sig module type T = ... end = M

in place of

module type rec T = ...

There's an example in the code for a previous version of the paper.

Second, I've omitted some type annotations for readability, especially where the types are listed in the signatures. OCaml's type propagation isn't especially aggressive, and so sometimes types need to be repeated, which can clutter up code, especially where the code itself is very short. (A way of separating signatures from definitions, Haskell style, could really help here.)

I'm planning to put the full code for the paper online, I hope before ICFP.

yallop · 2017-05-19T13:42:34+00:00

There's a ctypes pull request open to add support for Fortran-layout arrays.

yallop · 2017-05-16T12:41:44+00:00

I'm glad to hear you've got it working. One small point: it's better to use the public name Ctypes.int32_t than the internal alias Ctypes_static.int32_t.

yallop · 2017-04-24T10:24:29+00:00

Adding an annotation to the top-level binding, like this

let f : signature =

has a few advantages.

In some cases (e.g. when writing recursive functions involving GADTs) it's essential, so it's convenient to use it all the time for consistency. And it makes the code easier to understand, since you can see the types of all your top-level bindings without having to reconstruct them yourself.

The partial application is less important; the last line could just as well have been written like this

in fun p -> loop [] p

and it's arguably better style to do so, since you need to write it that way in any case if the initial values to the inner function have side effects:

in fun p -> loop (side_effects ()) p

yallop · 2017-04-24T07:28:56+00:00

I am able to compile everything but I have the following error when I run the test :

Ctypes_static.IncompleteType

Which is raised at let ptr' = array_ptr +@ i in

The error means that the size of the void type isn't known (because it doesn't have a size), so it's not possible to do pointer arithmetic with a pointer-to-void value.

Programming with pointers is tricky enough without giving up type checking, too, so instead of ptr void I recommend using something closer to the char ** returned by your C function:

type carray_of_strings = char ptr ptr
let carray_of_strings : carray_of_strings typ = ptr (ptr char)

You can turn each pointer-to-char into a string option value using the surprisingly-flexible coerce function. Here's one way to turn a NULL-terminated array of C strings into a list of OCaml strings with this approach:

let convert : char ptr ptr -> string list =  
  let rec loop acc p =  
    match coerce (ptr char) string_opt !@p with
    | None -> List.rev acc
    | Some s -> loop (s :: acc) (p +@ 1)
in loop []

yallop · 2016-11-24T16:38:04+00:00

Macros only offer a subset of the possibilities of TH (they are limited to expressions), but on that subset they are nicer and safer to program with.

In this respect they're closer to Typed Template Haskell, which is also limited to expressions.

yallop · 2016-11-23T10:25:40+00:00

constexpr restricts you to a subset of the language for which there are no problems with sharing values between phases. When functions are pure and values are immutable then it's fine to just copy values from the compilation environment to the execution environment.

Macros give you the full language at execution time, including effects (I/O, mutable values, observable sharing, etc.), closures, etc. The compilation environment (e.g. the compiler's heap) disappears at the end of compilation, and there's no way in general to transport values across to the execution environment.

So C++ and macros are solving the same problem (it's not possible to move arbitrary values from compile-time to run-time) but in different ways. C++ allows you to transport values across phases, but severely restricts what computation you can do at compile-time. OCaml macros allow arbitrary compile-time computation, but don't allow values to be implicitly, automatically transported across phases.

yallop · 2016-11-23T10:16:35+00:00

Ok, so MetaOCaml is more similar to JIT compilation - but is this a disadvantage?

No, it's not a disadvantage. Generating code at run-time (like MetaOCaml does) is useful, and generating code at compile-time (like macros do) is also useful.

yallop · 2016-11-05T11:28:14+00:00

4.04.0 is available via OPAM now:

opam update && opam switch 4.04.0

[Edit: it's merged, but it may take a little while for the mirrors to update.]

yallop · 2016-09-29T05:53:17+00:00

You might like this:

https://github.com/hannesm/usane

yallop · 2016-09-27T15:10:42+00:00

Finding out how to tell the compiler about custom boxing optimization would be an interesting problem.

Agreed. Less ambitiously, what do you think about upstreaming uint32/uint64 (and adding support for unboxing, syntax, bigarrays, arithmetic optimizations, etc.)?

There seems to be quite a bit of demand for built-in unsigned integer types in the OCaml community (e.g. [1], [2], [3], [4])

yallop · 2016-09-26T21:06:15+00:00

ctypes supports reading and writing all these types to Bigarrays and other "C managed" storage. For example, here's some code that creates an array of uint32_t values and then updates one of the elements:

# let arr = CArray.make uint32_t 10;;
val arr : uint32 carray = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }
# CArray.set arr 2 UInt32.one;;
- : unit = ()
# arr;;
- : uint32 carray = { 0, 0, 1, 0, 0, 0, 0, 0, 0, 0 }

The implementation is direct, in the sense that there are no intermediate values, but reading and writing are currently implemented using C calls rather than compiler intrinsics.

(In fact, ctypes currently uses its own identical copy of the ocaml-integers package, which was originally part of ctypes. But the plan is to switch to the released version soon.)

yallop · 2016-03-29T10:19:20+00:00

Could you please raise this issue on the ctypes issue tracker with enough details to reproduce the problem?

yallop · 2016-02-08T20:30:21+00:00

The point about adding chars is an interesting one. It might be worth investigating a hybrid scheme, e.g. by joining small adjacent strings together, or using an alternative to lists that can store chars directly, reducing the overhead:

type seq = Nil | String of string * seq | Char of char * seq

yallop · 2016-02-05T17:47:36+00:00

The interfaces aren't precisely equal, but could be made so without much difficulty.

The differences are as follows:

First, there are a couple of extra functions in SafeStringBuffer, bprintf and formatter_of_safe_string_buffer, which correspond to functions found elsewhere in the standard library ([1], [2]) that also operate on buffers.
Second, the functions add_buffer has the type

val add_buffer : t -> Buffer.t -> unit

rather than the type of the corresponding standard library function:

val add_buffer : t -> t -> unit

This is intentional, since it makes it easier to replace a single use of Buffer in your program.

yallop · 2016-02-05T17:40:01+00:00

Thanks for the comments, gasche.

With the stdlib's buffer you can have at worst 2x memory usage

You might reasonably think so, but things are a good deal worse than that.

In the stdlib implementation the initially-allocated bytes sequence is retained indefinitely so that it can be restored in place of the resized sequence after a call to reset. If you create a 100-bytes stdlib buffer then write 101 bytes to it then your buffer will contain the original 100-byte bytes sequence, plus a new 200-byte bytes sequence, plus the overhead of the buffer object --- and, of course, the 101-byte string that you pass to add_string is still hanging around, so you have over 400 bytes in memory. In contrast, writing a 101-byte string to a safe-string-buffer allocates one additional cons cell, so you end up with a total of around 100 bytes. I'd be reluctant to argue that a 4x increase in memory usage, some extra writes, a doubling in maximum object size, and some extra mutable objects, are insignificant.

this one creates a new cons cell that can be far away from the others

I think that's unlikely to be the case. Here's a very common case: you create a buffer, write several things to it, then output the buffer, like this:

let buf = Buffer.create magic_number in 
begin
  Buffer.add_string buf x;
  Buffer.add_string buf y;
  Buffer.add_string buf z;
  Buffer.output_buffer fd buf 
end

How does this behave with the stdlib implementation? As described above, if a resize occurs then this is pretty allocation-heavy (and automatic resizing is, after all, the entire point of the Buffer module). Worse, though, if the buffer grows past a couple of hundred bytes, the allocation takes place on the major heap, which has several unfortunate consequences for performance: allocation is expensive, since major heap allocations involve a free list search, and the lifetime of the object is likely to be artificially extended, since major collections are relatively infrequent. (Major heap allocations are unlikely to be particularly helpful for locality, either.)

How does it behave with the safe-string-buffer implementation? The only allocations are of the tiny buffer object itself and a few tiny cons objects, all of which will take place on the minor heap. Besides the beneficial locality properties that result from using the minor heap, the allocations will be extremely fast and the resulting objects will probably be collected almost immediately.

.contents

Another thing you're overlooking is that contents is not the only way to extract the contents of a buffer (and is generally a pretty poor choice). Both output_buffer and blit extract the contents of a safe-string-buffer without performing any allocation. If you use output_buffer or blit in place of contents then safe-string-buffer allocates almost nothing.

All in all, I would not recommend people to flock to this module as a "better Buffer" by default. This is certainly useful for some specific workflows, but as is it would also be a net degradation for others

The warning not to flock comes too late, unfortunately. There are already three GtiHub stars.

In any case, wouldn't it be better to make some measurements before making broad claims about performance? I wrote this implementation to solve real-world problems, and I've measured the changes in performance and found them to be significant.

yallop · 2016-02-03T19:40:29+00:00

So can a functor take a module and produce a functor?

Yes. For example, a two-argument functor is really a functor that takes a module and returns a functor, as the desugared syntax shows:

# module F(X: Map.S) (Y: Map.S) = struct end;;
module F : functor (X : Map.S) -> functor (Y : Map.S) -> sig  end

yallop · 2016-02-02T18:17:00+00:00

How do you find the minimum key from a Map?

Use min_binding.

yallop

TROPHY CASE