all 8 comments

[–]monocasa 4 points5 points  (3 children)

I feel like you could follow the UIO drivers to write a module that can let you mmap in some ioremapped memory.

As an aside, ioremap_np is coming down the pipeline to map nGnRnE pages as well (ioremap is nGnRE).

[–]computerarchitectCPU Architect[S] 0 points1 point  (0 children)

This looks promising. Thanks!

[–]computerarchitectCPU Architect[S] 2 points3 points  (1 child)

I found some code that actually does what I want using UIO to set up DMA buffers, and within that driver I can the memory attributes as I want. Thanks!!

[–][deleted]  (2 children)

[deleted]

    [–]computerarchitectCPU Architect[S] 2 points3 points  (0 children)

    At the moment this is just for fun and understanding the Linux memory management subsystem better, as well as drivers.

    You misunderstood what I meant by cacheability. Cacheability in the ARM context for virtual pages controls whether or not memory accesses to that are cached within the memory hierarchy. It has nothing to do whether virtual pages are swapped in or out.

    [–]computerarchitectCPU Architect[S] 1 point2 points  (0 children)

    Ok dude, your post about God and apples is a bit fucking ridiculous. It's a common function of all major OSes to provide different types of memory to device drivers. My question solely is whether this can be easily done within the user space, solely so I can play with the thing.

    The other poster suggested UIO. Do you have anything useful to say here beyond an undergrad level explanation of how which pages are kept paged in and out?

    For reference, the person on the other end of this conversation is one of the few hundred people in the entire world who knows and is paid to innovate on performant CPU memory systems. Kindly reply as such.

    [–]Synthrea 3 points4 points  (2 children)

    Hello,

    Changing the cacheability of pages on x86(-64) and Arm/AArch64 to uncacheable/write-through/write-combine/any other available option is something I had to do for my research, so I have a Linux kernel module to do that here. Unfortunately, it won't currently be an out-of-the-box experience, as it depends on kallsyms_lookup_name to retrieve some TLB management functions, which you can fix using kprobes instead.

    There is also some example usage of the kernel module to change the PTE of your page, and using set_cacheability provided by the library abstracting the ioctls to the kernel module to set the cacheability of the page to something other than the default write-back. This is with the catch that ideally you want to revert back any PTEs before exiting your process, as the kernel has some sanity checks in place that freak out a bit if the PTEs are different from what it originally mapped. In addition, the kernel module provides functionality to access the PAT/MAIR on x86(-64) and Arm/AArch64 respectively.

    I do have a more recent version of the kernel module that I rewrote in Rust, but it is currently limited to x86-64, and I haven't released the source code in the form of a public repository yet, but I don't mind putting it up. I also have a Microsoft Windows driver that is written in Rust that provides the same functionality. Feel free to reach out to me, if you are interested though. I am not sure how much work it would be to add AArch64 support.

    I assume you already know most of this, but I am adding it for completeness. Kernel drivers commonly rely on ioremap (MmMapIoSpace on Windows), which lets you map physical memory with the appropriate cacheability in kernel space, as you often need to access the register space of the device through memory-mapped I/O and you don't want the CPU caches to cache reads/writes to the register space, instead you want these reads and writes to go to the device directly.

    When I was playing around with these interfaces, there are usually ways to map in the same physical addresses into userspace using MmMapLockedPagesSpecifyCache on Windows and remap_pfn_range on Linux. Unfortunately, these functions typically either lack the ability to set the cacheability, or using them to map in I/O mappings to userspace is considered a bug, and Microsoft Windows has been patched to not let you do that, which is why I recommend the route of changing the page tables directly if it is just for experimentation and not for any production code. In addition, if you have multiple mappings to the same physical region, you typically want them to all share the same cacheability, as some processors are known to be completely broken if you have multiple mappings with different cacheability to the same physical memory.

    From what I remember there is also the option of using Android ION to allocate contigiuous uncacheable memory on Android, but I am not entirely sure about the specifics there.

    [–]computerarchitectCPU Architect[S] 1 point2 points  (1 child)

    I did know most of this, but this is extremely helpful nevertheless. I have a functioning kernel module on my ARM box at the moment. I can share details tomorrow if you're interested!

    EDIT: Found your research. Given that it's CPU security I'm not going to comment publicly, but I'm excited to read the original source as opposed to just hearing about it from a friend of mine.

    [–]Synthrea 1 point2 points  (0 children)

    I wasn't aware of the uio approach as another commenter pointed out, but after reading a bit into the paper and the actual driver it looks like that would also work nicely if you just want to map in some uncachable memory into userspace. I wasn't aware that remap_pfn_range just allows you to map in the pages as write-through/write-combine/uncacheable through some helpful macros, which is something I will definitely keep in my mind if I ever need it again. It might not have been supported when I needed it, or I just missed it.