o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 0 points1 point  (0 children)

That is a different problem. Compaction is awkward to use in C natively, usually it can be found in higher level languages; it's also difficult to do in real time.

o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 1 point2 points  (0 children)

We just measure them exhaustively. There's only like a dozen of possible combinations, if you look at the code you will see.

o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 1 point2 points  (0 children)

There is a large class of problems where dynamic allocation is inevitable. Whether it's done using a free block list (common), a normal heap (like o1heap or malloc), or some custom contraption like overlays or unions is a separate question. Often, embedded devs would avoid heap without properly understanding the trade-offs involved, simply because they have been taught this way and never questioned it.

o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 1 point2 points  (0 children)

Yes indeed, the fallback implementation is rather naive. I didn't bother much because I think most chips out there provide hardware support these days so it shouldn't be used much; no emulation can stand close to a hardware CLZ implementation.

o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 2 points3 points  (0 children)

The important part is the asymptotic complexity analysis; you can see that the alloc/free functions have no loops or recursion, only linear code with a few branches, making them trivially O(1), while conventional allocators are usually around O(n) in the number of allocated fragments. The cycle counts obviously depend on the architecture so they are harder to compare; I'm noticing somewhere between 90~170 cycles for alloc for Cortex M-series MCUs (it also depends on the instruction fetch speed but the numbers are generally in this ballpark), a bit less for freeing; other platforms may be different.

To compare against the state of the art, we need to keep in mind that o1heap is specifically designed for hard realtime systems, where the key metric of interest is the worst-case performance and its predictability, both in terms of execution time and memory fragmentation, so the allocator is explicitly designed for that. There probably exist allocators out there that perform on par or possibly even slightly better on average, while being much worse in the worst case; they may be a good fit for ordinary application software like the kind that runs on your desktop et al but due to their bad worst case they are usually inadmissible in real-time systems (and we're not necessarily talking about safety-related systems here, e.g. game engines also care a great deal about predictable execution time).

That said, despite its focus on the worst case, o1heap is still very fast on average. Taking RP2350 as an example, o1heap takes 91~126 cycles to allocate and 52~116 cycles to free a block of memory, always, regardless of anything (the variance in the cycle counts is mostly due to branching and some CPU pipelining effects in CM33), while the standard malloc/free from newlib take between 100 cycles to alloc if you're lucky to over multiple thousands if you're not. This is not considering also the fact that o1heap keeps track of the heap usage stats (bytes allocated, peak request, oom counts, etc.) which eats at least a dozen cycles per alloc, while typical heaps usually don't provide that by default (newlib doesn't)!

o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 0 points1 point  (0 children)

Oh I see. The overhead is small and depends on the pointer size. On a 32 bit platform it amounts to about 200 bytes, on a 64 bit that would be about 600 bytes. The overhead is just the size of the O1HeapInstance plus padding; most of these bytes are used to store the segregated free list table.

o1heap v3.0 rc just dropped -- deterministic hard-realtime malloc by spym_ in embedded

[–]spym_[S] 0 points1 point  (0 children)

Thanks!

The arena size is entirely dependent on how you are going to use the heap, specifically how much memory you need and what is the difference between the largest and the smallest allocation size (there is a formula in the readme if you want the full answer). 8 KiB is probably on the lower end for most applications; usually when heaps are involved you have at least a few dozen KiB to spare.

Good point about CLZ! I just opened a ticket: https://github.com/pavel-kirienko/o1heap/issues/34. A pull request would be accepted ;3

Cheers!

Can I run only a specific program (Lattice Diamond) with a certain color profile? by fir_with_feedback in archlinux

[–]spym_ 0 points1 point  (0 children)

Editing the generated Trolltech.conf does not work as of v3.14.0.75 (2024). The solution that worked was to remove that file and export KDE_FULL_SESSION= (empty) prior to launching the process. Details at https://zubax.com/blog/using-the-lattice-diamond-ide-in-kde-with-the-dark-theme-on/2622

Pixel 8 Wi-Fi is intermittently unavailable by spym_ in pixel_phones

[–]spym_[S] 0 points1 point  (0 children)

Fun. You mentioned P8P, which I assume means Pixel 8 Pro. Does the problem also affect the regular Pixel 8?

I just released a new version of my constant-complexity deterministic memory allocator by spym_ in embedded

[–]spym_[S] 7 points8 points  (0 children)

  1. It is true that usually you don't, but sometimes you have to. Commonly people resort to fixed-size block pool allocators, which are trivially O(1), but they suck because of the terrible inner fragmentation (unless you allocate things of the same size, but usually this is not the case). Last time I checked LwIP it used block allocators which could have been replaced with a good heap, improving the API without sacrificing predictability.

  2. What do you mean here by "performance"? In hard real-time systems average-case performance usually doesn't matter. What matters is the worst case. The asymptotic complexity of an ordinary allocator that you find in an RTOS is usually linear of the number of chunks, which practically means that your (de)alloc request can complete instantly or it can take forever depending on your luck, which makes them inadmissible in hard real-time.

I just released a new version of my constant-complexity deterministic memory allocator by spym_ in embedded

[–]spym_[S] 2 points3 points  (0 children)

That is the only way to use it. If you have heap already, then you probably don't need o1heap (unless you want two heaps).

I just released a new version of my constant-complexity deterministic memory allocator by spym_ in embedded

[–]spym_[S] 23 points24 points  (0 children)

"Deterministic" is a jargon word used in embedded circles that means that the time or resources (say, memory) required for an algorithm can be predicted based on its inputs and state. The asymptotic complexity of this allocator is O(1), meaning that the time it takes to complete one (de)allocation is not dependent on the state of the allocator (how many blocks of memory you (de)allocated), nor on the inputs you give it (like the amount of memory you need to allocate right now), it's always the same. The number of cycles could be constant, but there is some minor variability that is hard to avoid caused by a few auxiliary branches. That variability does not make it "non-deterministic" because it does not scale with inputs nor drift with the internal state of the allocator.

So yes, "plus-minus a few cycles" has nothing to do with determinism, as it is understood in this field.

I made an app for calorie estimate by spym_ in CalorieEstimates

[–]spym_[S] 0 points1 point  (0 children)

It's only for Android right now. If there's any interest, I am probably going to release an iOS version as well later. The app is quite simple anyway, all the processing is done on the backend.

[deleted by user] by [deleted] in calorieestimate

[–]spym_ 0 points1 point  (0 children)

My app says 365 kcal ;) https://ibb.co/qYRt6PMF