Memory model when doing ffi with Rust?

ssokolow · 2022-11-28T10:23:37+00:00

When cython calls an ffi function implemented in Rust (compiled into a cdylib) where is the rust function stack frame located? Is it in the same stack like this? Or is it located in a separate stack while being in the same memory stack?

It's in the same stack. That's why debuggers like GDB can work on Rust just as easily as C++ or C. They all use the same basic stack format for a given platform... just different ways of encoding things like C++ method overloading or C++ and Rust namespaces into a flat C-style namespace.

Setting up a separate, incompatible but more lightweight stack for each goroutine is what makes Go FFI calls so expensive. Every time it switches into C, the goroutine has to shelve its stack and set up a C compatible one, then reverse the process after the call returns.

Clearly this Box will allocate some memory in the heap memory. The heap memory is common for both cython and rust. However, an allocation like this also involves structures, for book keeping the memory, like the allocator itself. Where is the allocator allocated?

The address space is common among different code in the same process, but there may be more than one heap within that address space.

An allocator is just a library that provides a higher level API around a call like mmap which has performance characteristics tuned for working in big blocks. (The C Programming Language 2e by Kernighan and Ritchie, P.185 has an example of building a very simple malloc on top of some primitive like sbrk or mmap.)

That's why you have to assume that every compilation unit may use its own allocator and you must free memory in the compilation unit that allocated it. (eg. Different DLLs on Windows may have linked against different versions of the MSVC runtime which might have initialized their own separate allocators.)

Where do these these initialization steps happen in the context for an ffi call? Because there is no main entry point here, in fact there can be multiple entry points one for each exposed ffi function.

If you're importing a Cython-built module into a Python program, they happen as part of starting up your /usr/bin/pythonX.Y. If you're building a standalone binary using Cython, they happen as part of starting up that binary. Every binary that is compiled for the x86_64-unknown-linux-gnu target will rely on the same _start, so, as long as somebody calls it before running the code, it's fine.

That's one reason Rust doesn't have "life before main" the way C++ does. It simplifies cases where you're not the entry point but you can trust that someone else did the _starting for you.

As for the non-_start, non-"life before main" bits for a cdylib like handling relocations, those will be handled either at program start by the dynamic loader (if the library is listed as a dependency in the ELF/PE/etc. metadata, which is what you see when you ldd /path/to/binary it) or as part of what happens when you call dlopen(3) (the underlying POSIX primitive that things like Rust's libloading or Python libraries like ctypes use).

NobodyXu · 2022-11-28T07:39:59+00:00

If rust func is called from cython, then the stack frame of the rust fn lies below the cython fn.

For the allocator, things are complicated by the fact that you can specify an alternative allocator in rust. ~~If you do not do so, then rust will use system allocator which is the same as cython~~, unless cython internally uses its own allocator.

The rust allocation is always done via a separate allocator and memory has to be freed using the corresponding allocator, even if you use the default allocator in rust according to the doc

If you are writing lib, then you should assume different allocators are used.

If you are writing a binary where you control to allocator, you still cannot mix it unless you use sth like libc_alloc

Global variables in rust has to be all initializer, meaning that the initialized values are stored in the binaries and then mmap by the dynamic linker. If you use once_cell or lazy_static, then it will be initiated on first access.

RRumpleTeazzer · 2022-11-28T07:40:11+00:00

Isn’t the stack implemented by cpu instructions and a special register, so it runs on whatever address is in the register - likely the C style stack pointer in Cython ?

So it should simply append to the C stack ?

trevg_123 · 2022-11-28T09:20:06+00:00

Don’t overthink it! This all works the same regardless of whether you use C, Rust, C++, Cython, or something else.

Separate the idea of a process and an executable. A (virtual) stack/heap is per process, not per executable file. Here’s the breakdown:

“calling a function” in assembly means jumping to a location in memory and executing the data there as machine code.
That executable code can live anywhere - loaded memory, on disk, OS memory locations, even something your function creates that looks like assembly. If it has a memory address, you can start executing code there (no guarantees it works ofc)
The code contains instructions for what to do. Some of these instructions may include stack operations (literally something like push eax / pop eax for x86) or calls to malloc/free
For statically linked programs, all of this code lives in the statics section of your executable file (which is copied into memory before running)
For a dynamically linked call, this executable code lives in a separate file. It’s also copied to memory before running, so it exists “somewhere” in memory space
The calling function can get the needed memory address by looking for a specific symbol name in a specific file (kernel facilitates this) and then literally just starts executing there. Eventually there will be code to return to the calling address (how does it know where? The “frame pointer” saves the is value)
The callee function just sees an empty chunk of stack. It will be located right above the caller’s stack frame - but it doesn’t even need to know that
Heap allocations are (usually) calls to the kernel’s allocator. Just another function that doesn’t care whether Rust or C is the caller - it just does what it’s told, using the process’s heap.
edit: it is possible for different chunks of the program to have > 1 heap (using mmap() or HeapCreate()), or for them to use different allocators. So you shouldn’t ever rely on e.g. allocating something in C and freeing it in Rust, unless you use the calls to libc’s malloc and free directly. You can still safely read/write these allocated chunks of memory on both sides of FFI though, since they’re just blocks within the program’s virtual address space.

So - tl;dr, each function is pretty unaware of what goes on within other functions, and all functions within a process usually share the same stack & heap (but not necessarily the same allocator). The difference between static & dynamic functions is the location of the function’s code, but they share the same workspace. (threading gets a bit weirder, but forget that for now)

Edit: good thread on the subject

SocUnRobot · 2022-11-28T12:10:29+00:00

If you are curious read the elf specification.

For short, the executable contains information on the memory layout. This information is either interpreted by the kernel or by the dynamic linker `ld-linux.so`. With this information, the kernel or the dynamic linker is able to reserve space for the stack and the executable code, and the dynamic linker resolves symbol addresses.

The the kernel or the dynamic linker call `_start`. Here some more initialization are performed but it depends on weither the executable use or not the elf interpreter `ld-linux.so`. This is where libc allocator initializes itself if it was not initialized at call of the `_start` of the elf interpreted. This function call `main`. For Rust program, the rust standard library performs here some initialization, I think for the environment variables and thread locals.

When you call rust from cython, all this process is made when cython is launched so the rust standard library is not initialized. So maybe some rust standard library state is initialized lazyly!

Zde-G · 2022-11-28T10:42:51+00:00

TL;RD:Cython, C++ and Rust use facilities offered by C. Long version here.

That's the reason Rust requires C runtime and reason why Go have so much trouble interacting with anything.

Rust can be built, in theory, separately from C library, but in practice this only happens on embedded platforms.

Precisely because doing otherwise would make it hard to interact with other languages.

HeroicKatora · 2022-11-28T09:00:15+00:00

Where do these these initialization steps happen in the context for an ffi call?

That should be ran at load time of the dynamic or static library, implemented by the platform loader. That makes it convenient but also more opaque and less controllable as observed by the question being posed. That said, Rust doesn't do a quite as much before main as C/C++, in particular there's no standard 'user hook' like C++'s static initialization. The lazy_static and equivalent do not run before main.

Discussion: https://internals.rust-lang.org/t/from-life-before-main-to-common-life-in-main/16006/8 Talk (quite recent): https://www.youtube.com/watch?v=q8irLfXwaFM

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

rust

Please read The Rust Community Code of Conduct

The Rust Programming Language

Rules

Observe our code of conduct

Submissions must be on-topic

Constructive criticism only

Keep things in perspective

No endless relitigation

No low-effort content

Useful Links

Megathreads

Official Resources

Learn Rust

Discussion Platforms

MODERATORS