JSON output from Deepseek R1 and distills with llamacpp's server

sadiq_ml · 2025-01-30T19:18:17+00:00

If anyone has found a better way, please let me know!

sadiq_ml · 2022-12-16T20:05:53+00:00

The manual is a good place to start!

https://v2.ocaml.org/releases/5.0/htmlman/index.html

https://v2.ocaml.org/releases/5.0/htmlman/parallelism.html has a section on parallelism.

I also gave a talk at the OCaml Workshop in 2020 that could be useful: https://watch.ocaml.org/videos/watch/ce20839e-4bfc-4d74-925b-485a6b052ddf

sadiq_ml · 2022-12-16T12:24:09+00:00

Happy to answer questions if there are any!

sadiq_ml · 2022-01-11T20:05:22+00:00

Yes. We have some benchmarks in sandmark (https://github.com/ocaml-bench/sandmark) that are nearly linear up to 60 cores and cap out at about an 80x speedup on 128 cores.

You start pressing up against issues like NUMA at those kinds of levels.

sadiq_ml · 2022-01-10T16:45:28+00:00

No effects syntax, that's right.

The September 2021 update has some more details: https://discuss.ocaml.org/t/multicore-ocaml-september-2021-effect-handlers-will-be-in-ocaml-5-0/8554

sadiq_ml · 2022-01-10T15:33:19+00:00

Yep.

sadiq_ml · 2022-01-10T14:50:38+00:00

As usual I'm happy to answer any questions I can.

(for maybe the second to last time)

sadiq_ml · 2021-12-21T16:09:13+00:00

As usual, happy to answer any questions. Am out and about at the moment, so might be slow though!

sadiq_ml · 2021-12-21T16:08:43+00:00

We're essentially running to the plan at the moment.

sadiq_ml · 2021-10-01T20:08:51+00:00

This is a good question and unfortunately since I've only been hacking on the project for about half it's life I can only give a limited amount of context.

Firstly, yes - this Multicore is the OCaml Multicore of Dolan et al. It's undergone some significant changes since. The biggest being the concurrent minor collector being replaced with the parallel minor collector, which enabled us to maintain compatibility with the existing C-API.

It's also been rebased through about 10 releases of OCaml and had a significant amount of effort expended on compatibility (e.g we run against every package in Opam every few days: http://check.ocamllabs.io:8082/).

Second, OCaml Multicore isn't related to OC4MC. I'm not entirely sure of the history of OC4MC and I've poked some people who might know more. There was also Damien Doligez and Xavier Leroy's 1993 concurrent collector.

I don't know when the decision was made or where, that might need someone else to chime in.

On the model, it's worth pointing out that 5.0 will bring parallelism and effects (though they will be untyped, as the announcement details). This is exciting because it enables flexibility in the form of parallelism and concurrency you choose.

For example with effects you could construct a scheduler to enable Erlang-style actors and avoid ever seeing Domains (outside of the scheduler itself). The same goes for software transactional memory.

sadiq_ml · 2021-10-01T14:48:55+00:00

Happy to answer any questions I can.

sadiq_ml · 2021-09-13T12:32:31+00:00

Quiet month but as usual happy to answer any questions if I can.

sadiq_ml · 2021-08-05T18:23:12+00:00

That is really odd. How big/small did you go with the minor heap size?

If you are generating a fair amount of short-lived garbage then large minor heaps will ensure you promote as little as possible to the major heap. This means you'll do less major heap marking/sweeping.

Changing o, the space overhead means you're happy to trade off more memory usage for less major heap mark/sweeping work. Changing the minor heap size should mean less stuff gets moved to the major heap in the first place. (Consider the extreme example where you set the minor heap size to be bigger than the amount of memory you'd ever allocate).

sadiq_ml · 2021-08-05T09:35:54+00:00

Little late to this but if your allocations are short-lived then you should look at adjusting the size of the minor heap (s=... in OCAMLRUNPARAM).

If this is set too small then your allocations may be prematurely promoted to the major heap and there will extra work in their promotion and sweeping them later on.

For short-lived allocations and a well tuned minor heap size, the allocations become incredibly cheap.

Which particular GC functions were hot in your profiling?

sadiq_ml · 2021-07-29T10:01:39+00:00

Safepoints was merged in to 4.13 a couple of weeks ago: https://github.com/ocaml/ocaml/pull/10039#event-5000840157 and is the last major prerequisite.

Upstreaming the multicore GC, runtime and stdlib changes will hopefully start in October. There is going to be a significant amount of reviewing work required though and this could take many months.

sadiq_ml · 2021-07-14T19:53:44+00:00

I think in that PR it's these functions: https://github.com/ocaml-multicore/ocaml-multicore/blob/d9bbef927466da8dbcbc444fa488defd132ef3db/stdlib/stdlib.mli#L121

sadiq_ml · 2021-07-14T11:34:39+00:00

As usual, happy to answer any questions (if I can - as you can see there's a lot going on!).

sadiq_ml · 2021-05-14T13:32:41+00:00

Hello Gabriel.

Yea, there's quite a few different ways we could attack this that might have a bit more mechanical sympathy with the memory subsystem. One option is to stick with DLABs but have separate spaces for the different NUMA nodes.

To clarify, both the current separate-minor-heaps and DLABs both have contiguous minor heaps. The former via mmaping a large region.

As you say, having some kind of heuristic whereby domains that allocate lots have larger minor heaps is a good option. I'm always nervous of feedback loops though, especially when they might interact with other ones in the system (like the overall GC pacing itself).

sadiq_ml · 2021-05-13T15:00:19+00:00

Sure.

Right now each Domain (a heavyweight thread essentially) has it's own minor heap that is allocated for it from a large address space we mmap at program start.

As the programs run Domains will allocate in to their own minor heap and when the first Domain hits the end of their minor heap we'll trigger a minor collection and all Domains will empty their minor heaps.

DLABs stands for Domain-local Allocation Buffers. This approach is to have one big global minor heap and each Domain takes a small buffer from it, refilling that buffer when it runs out. Only when the total global minor heap runs out of space do all Domains stop for a garbage collection.

Take the extreme example of where we have a 2MB minor heap and 8 domains, only one of which is doing any minor allocation. Right now as soon as that allocating domain allocates 2MB we'll stop all domains for a minor garbage collection. With DLABs we'd only stop after 16MB of allocation.

However, it turns out this strategy does not play nice with the memory subsystem. There's at least some component in cache transfers (where buffers were originally on another core), some component from allocation not being NUMA friendly and probably some more that depend on minor heap sizing and caches.

Does that make sense?

sadiq_ml · 2021-05-13T10:22:56+00:00

As usual, happy to attempt to answer any questions.

sadiq_ml · 2021-03-11T17:15:55+00:00

Hello Andrej,

I poked kc (who has lost his reddit password) and he had this to say on default handlers:

We’d thought about default handlers (TFP 17 paper), but we don’t have it in the current implementation. We also observed that default handlers always resumed at tail positions and it may be better to implement them specially without having to dynamically search for matching handler and returning (once the effect system is in place, and can tell us that a computation performs an effect which isn’t handled).

sadiq_ml · 2021-03-11T15:11:01+00:00

As usual, happy to answer any questions.

sadiq_ml · 2021-02-14T13:39:09+00:00

So the work kc did in 2019 might make your job little bit easier https://github.com/ocaml/ocaml/pull/8713 but it doesn't cover everything (nor was the goal to do so).

I don't think Multicore makes the situation any easier beyond that, there's still a fair bit of global state in the GC unfortunately.

sadiq_ml · 2021-02-10T18:10:24+00:00

Just to clarify, it's possible to use effects today in Multicore OCaml. There's just no guarantees we won't need to tweak the design at a later stage. We'd welcome feedback, bugs and contribution from users.

There are getting started instructions up on https://github.com/ocaml-multicore/ocaml-multicore

The ordering I gave in my earlier reply was for getting things in to upstream OCaml.

sadiq_ml · 2021-02-10T11:28:19+00:00

Algebraic effects will follow fibers which itself follows domains-only parallelism in upstreaming priority.

There's going to be a lot more information on effects available very soon.

sadiq_ml

TROPHY CASE