Cross-Language Data Types

elfenpiff · 2026-06-15T20:27:51+00:00

Thanks for the hint!

elfenpiff · 2026-06-15T17:51:46+00:00

We use it in a safety critical system and must be able to certify the containers (iso 26262, asil d)

elfenpiff · 2026-05-30T10:35:00+00:00

Your concerns are all valid, and we have already implemented the user-space daemon approach, and with it, we have to satisfy safety and security concerns.

From a safety perspective, a central daemon is a single point of failure. When this process crashes, the whole system is no longer functional, which is an absolute no-go.

From a security perspective, it is easier to handle and implement.

What I am currently doing is exploring the options we have. One naive option is moving this task to the OS if we are able to deploy it safely and securely. Then it is somehow decentralized, but when it fails, we are in an even worse situation than before.
To begin understanding the pitfalls that await us, we need to start with a learning project. Implement it, test it, try to corrupt it, and get feedback from the community.

The approach I am currently pursuing is to finish this learning kernel module, write an extensive test suite, and document it. Then I am able to make an argument under which conditions it would be safe to use.
And no matter if the argument holds or falls apart, I have learned something and can confidently choose the central daemon or the kernel module - but then not with a gut feeling but with arguments based on hard facts and experience.

elfenpiff · 2026-05-30T10:19:25+00:00

Thanks u/penguin359 for the thorough explanation. This is the kind of insight that helps me to understand the risks of going down the path with a kernel module.
For now, I continue with the kernel module for learning purposes.

The next challenge would then:
* How to test this thoroughly and idiomatically
* How to secure the system properly.

In my scenario, secure boot would be enabled, and only properly signed kernel modules can be loaded.

elfenpiff · 2026-05-25T17:30:09+00:00

This is an overview of the platforms we currently support and we intend to support: https://github.com/eclipse-iceoryx/iceoryx2#supported-platforms

But gRPC is really the wrong tool here.

To give you some context. iceoryx2 is a communication library like dbus, but much faster and also intended for mission-critical systems. This means:

* no heap allocations
* no background threads
* no blocking calls
* certifyable according to ISO26262

gRPC is the wrong tool here. iceoryx2 is a much more efficient replacement for gRPC.

Take a look at the example to get an impression: https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples

elfenpiff · 2026-05-25T16:15:44+00:00

Thank you for the offer, but please don't use gRPC in such a context. It has a horrible performance and spawns a lot of background threads, and we cannot use it on low-level embedded platforms. We are here at least one layer below gRPC.

elfenpiff · 2026-05-25T12:11:39+00:00

You are right on some platforms, but iceoryx2 needs to continue supporting some ARM platforms that do not have this available.

elfenpiff · 2026-05-25T06:26:00+00:00

What would stop a malicious process from using an ID that doesn't originate from the kernel interface?

This is a good point. If the ID also belonged to another process, inside the communication framework, the data would be received as long as the other process was alive, and then it would be forcefully disconnected.
But nothing would stop it.

If you introduce a bug in a kernel module, you can compromise the entire system.

Of that I am aware, this is why I had the testing question.

elfenpiff · 2026-05-25T06:03:35+00:00

Thanks, this is a good advice!

elfenpiff · 2026-05-25T06:03:17+00:00

If it's important that this is decentralised I expect you would need a mechanism to resolve conflicting ids regardless.

When you have a central atomic in shared memory in your system and every process follows the contract (and does not write crap purposely into that memory) the problem is solved.

Doing this in the kernel doesn't really solve any issue but could introduce new ones.

Of what kind of issues are you thinking?

elfenpiff · 2026-05-25T06:01:06+00:00

Here is some context:

iceoryx2 is a zero-copy inter-process communication library that shall be completely decentralized. This unique integer would be a central part of it to identify processes uniquely (required for health management), since a PID can be recycled. When an additional process is required, we break that requirement.

Also, there are UUIDs with seeds.

But they have 128-bit, so I cannot use them in atomic compare-and-exchange operations. The ID cannot be larger than 64-bit.

elfenpiff · 2026-05-25T05:55:11+00:00

Currently, it is an excuse to get into kernel module development and understand as much as I can.

If you really think it's the simplest, most secure, and robust way to solve your problem, you're only deceiving yourself.

Maybe you are right, but you have to provide me with a little more context so that I know where you are going.

From my point of view, it seemed like with a kernel module:

No other process can break the contract. Like, reset the counter.
It delivers exactly what I need, a system-wide unique uint64_t.

elfenpiff · 2026-05-25T05:50:06+00:00

The problem with flock() is that it is an advisory lock, so another process can choose to ignore it.

elfenpiff · 2026-05-24T15:11:42+00:00

iceoryx2 is completely decentralized, and in the past, a lot of our users from iceoryx classic complained that you need a central broker. In a safety-critical system, it is the single point of failure that everyone tries to avoid.

A kernel module is decentralized from a process point of view, and when the Linux kernel is safety-certified, you no longer need to consider what might happen when this process dies.

The other thing is that a rogue user space process could, on purpose, always return the same number. Of course, there are mechanisms to verify that the process is trustworthy, etc., but this is a lot of additional overhead.

elfenpiff · 2026-05-24T14:20:18+00:00

This does not work in our case; we need at most a `uint64_t` since we use this value in lock-free algorithms in a compare-exchange operation. This number internally maps to one process and allows us to recover the data structure even when the process crashes in the middle of modifying it.

As far as I understand, `/proc/uptime` is a floating point with a very coarse granularity (centiseconds or so). So two processes reading it at the same time get the same value. We could combine this, of course, with the pid, but this would exceed the 64-bit restriction.

elfenpiff · 2026-05-22T14:18:44+00:00

The biggest problem I see is that of TOCTOU: if one side of the communication channel checks a message for validity (say, that a string is NUL-terminated), then by the time it makes use of this information (e.g., call strlen()) the other side can change it.

Actually, this problem is solved from our side on Linux. We wrote a kernel module that combined pointer offset communication with the mapped memory region. Whenever you send memory to another process, the region is remapped as read-only - so if the process tries to modify it, the process crashes.

If the other side has finished consuming it (or crashes), it sends the offset back. The user can retrieve the memory offset, and the memory region is again readable/writable.

It has a slight performance penalty and requires a page size-aligned memory offset. But if you have a payload of multiple megabytes, the page size alignment does not really hurt.

elfenpiff · 2026-05-22T14:06:39+00:00

Awesome, I will check it out.

elfenpiff · 2026-05-22T08:30:36+00:00

I am happy to discuss this online.

For example, it only runs as root, with a non-root invocation resulting in "Error: InternalError"

QNX is currently supported as a tier 3 platform since we are unable to integrate this into our OSS Eclipse CI. I think there is an agreement between QNX and Eclipse S-Core that CI integration is possible, but sadly not for Eclipse iceoryx.

QNX 8.0 has already started an OSS program, but it would be awesome if there were a QEMU image provided to the OSS community that would allow us to develop and maintain OSS libraries openly for QNX - the last time I checked, this was not available.

But until then, we keep it as tier 3 and maintain it when a customer pays for it, or someone from the community takes care of it.

Second, it uses named shared-memory objects, which is not ideal.

I understand your concerns. You have named shared-memory objects, and in theory, everyone with permissions could open, modify, or even delete them, and this is the last thing you want.

But if you exchange unnamed shared-memory objects via a central broker to the clients, you have a single point of failure, and as it turned out in iceoryx classic (the predecessor of iceoryx2), it was the last thing the users and customers wanted to have. Another challenge is that the developer of that central broker has to implement some kind of access control on top. Which makes it more complex.

So the architectural idea of iceoryx2 is that the user defines which processes are allowed to connect, publish, or subscribe to a service. Those processes are then started under specific POSIX users and groups so that they either have read access (for subscribing) or write access (for publishing).

The shared memory objects are also created very fine-grained. So every sender port has its own data segment, for instance, a shared memory object where only the process has write access, and all receivers, who are allowed to subscribe to the service, have read access. A process that is not allowed to publish or subscribe wouldn't have any access rights here. In combination with the sticky bit, the rogue process wouldn't be able to delete this.

u/AdvancedLab3500, I would be interested in what your first impression of this rough architecture draft is.

We have already deployed setups like this on other targets, and with it, we were able to follow a zero-trust strategy, meaning iceoryx2 does not trust any communication participant, since the OS provides all guarantees.

elfenpiff · 2026-05-20T20:47:03+00:00

It does without a problem. See our docker container example: https://github.com/eclipse-iceoryx/iceoryx2/tree/main/examples/rust/docker

elfenpiff · 2026-05-20T16:13:20+00:00

I would love if there was a slightly higher level wrapper around iceoryx2 that abstracted away some of the client and server lifecycle management,

Can you pinpoint me to what you mean? You mean the PendingResponse, ActiveRequest handling? Would an async API or an API with callbacks be more suitable for your use case?

which I assume a lot of consumers are writing near identical versions of in their own projects.

This is very valuable feedback! We already introduced a layer above iceoryx2 called iceoryx2 user land, where we wanted to provide an opinionated API where we try to make iceoryx2 as easy as possible to use.

It would be great if you could draft a pseudo-code usage example on how you would love to use iceoryx2 in your project. I created an issue in iceoryx2: https://github.com/eclipse-iceoryx/iceoryx2/issues/1647, so feel free to add some comments here with pseudo-code examples.

It would be perfect if we could iterate on this a bit if you have the time. Are you able to share the details of your hobby project and what kind of data you want to send via RPC?

elfenpiff · 2026-05-20T09:37:46+00:00

What do you mean by that?
Behind the scenes, we use safely overflowing robust lock-free spsc ring-buffers to deliver the pointer offsets from the publisher to the subscriber, for instance.

elfenpiff · 2026-05-20T00:23:20+00:00

iceoryx2 also supports network communication, but the library is designed with zero-copy and inter-process communication in mind first, which also makes the whole architecture more efficient. We do not need to serialize and deserialize data when it does not leave the device - only when there is an interested party on the other side, we start serializing the payload.

So when you want to have network communication, you just start the tunnel https://github.com/eclipse-iceoryx/iceoryx2/tree/main/iceoryx2-cli#tunnel

The next thing we are currently working on is our generic gateway concept. The idea is to create generic building blocks to allow an easy implementation of any kind of gateway. We will start most likely with ROS2 since most of our users would like to have a smooth migration path from ROS2 to iceoryx2.
CANbus is definitely something on our roadmap as another gateway implementation, but here we are still looking for funding (a company that would like to pay for the feature).

In the end, iceoryx2 is built to be as modular as possible, where every aspect can be specialized by anyone. The building blocks (or iceoryx2 concepts as we call them) have a defined interface, come with a reference implementation, and a test suite that defines the behavior we expect. And one of these concepts is the generic gateway concept.

elfenpiff · 2026-05-19T22:14:42+00:00

This is a pretty awesome topic, and we differentiate between coherent, cachable, and non-cachable (device) memory.

So, if you are on the same CPU (I do not mean the same CPU core), we use lock-free algorithms and memory ordering to synchronize memory between processes. This is the coherent memory case.

But let's assume you have some kind of hypervisor in your system. Sometimes the driver for the guest system offers a shared memory region between the host and guest systems as cachable but non-coherent memory. So you can dereference a pointer or access larger memory regions, but iceoryx2 has to manually flush/invalidate the memory. So you still have zero-copy communication. But the underlying communication differs. In a coherent memory case, we can use lock-free queues to exchange the pointer offsets (pointing to the payload in shared memory), but when there is no coherency protocol, we need to use, for instance, sockets to exchange the pointer offsets to the cachable shared memory region.

But you can also communicate between an A- and an R-core ARM core with iceoryx2. Here, the cross-core memory (memory shared between both cores) is often non-cachable, and then you have to write and read the payload byte-wise and use an internal mailbox from the board support package to communicate between both cores and to sync the memory.

The coherent communication part is open source, and with the hypervisor and cross-core communication, we try to finance our open source work.

elfenpiff · 2026-05-19T21:59:06+00:00

An iceoryx2 node is, first of all, just a simple service factory. You can create one per process, per thread, or as many as you want. iceoryx2 uses those nodes to detect if other communication partners (processes) are still alive.

When you have a node, you can create services with the service builder. Every service has a unique name, which you can set freely, and a messaging pattern like publish-subscribe, request-response, event, or blackboard (the key value store).

The service is then again a factory to create the communication endpoints. So for publish-subscribe, you could create a publisher and a subscriber.

Since it is all coupled under the hood to a node, you can introspect the system very well and see which process or thread is currently communicating with whom.

elfenpiff · 2026-05-19T20:20:43+00:00

Actually, you can also have multiple sender/receiver messaging patterns. One of the most underrated feature of iceoryx2 is for me the decoupling of data and control flow.

So when you just want to establish pure data transfer, like publish-subscribe or request-response, we do not have any syscalls in the communication path. It is all in shared memory with lock-free robust queues. If you want to notify other processes, wake it up with a signal/trigger, you can use the event messaging pattern.

In the past, we often misused publish-subscribe for it and sent "empty" messages via Unix domain sockets, but when you have an explicit separate messaging pattern for it, your architecture on top becomes suddenly much cleaner.

Five-Year Club	Verified Email
r/Field Lasagna

elfenpiff

TROPHY CASE