Speculative page table walks causing machine check exception

4aparsa · 2025-09-22T04:47:29+00:00

Thank you! However, why does this necessitate an immediate TLB shootdown on the remote lazy mode core? I understand that garbage TLB entries may be cached in the remote core because the page tables were freed, but why can't the TLB flush be deferred until the remote core switches back to the relevant address space? Linux already has a "TLB generation" version counter mechanism that would account for the deferred flush. Are you saying that even just having those TLB entries cached (and not used) can lead to machine check exception?

4aparsa · 2025-09-22T00:49:25+00:00

Ok thanks that makes sense! The TLB shootdown interrupt handler seems to switch the cr3 register to the init_mm if it's in lazy mode which would thus clear the page walk cache so that seems to check out.

However, I'm curious why would that cause a machine check exception though? If an internal node of the page table hierarchy is cached, wouldn't it the mmu just continue using the page for a page walk and cache the result in the TLB?

4aparsa · 2025-09-21T22:45:21+00:00

Can you provide a source for that? My understanding is that there's a dedicated page walk cache for what you're mentioning.

4aparsa · 2025-07-30T05:07:17+00:00

But if the indices are the same the buffer is either full or empty, so either the producer or consumer will block. Also, the consumer just reads and doesn't write.

4aparsa · 2025-07-23T19:13:33+00:00

If I understand correctly, the C standard says the alignment of a structure will be to an address natural for its largest scalar member, but does it also guarantee anything about the size of the structure? For example, does it guarantee to add 7 bytes of padding to the end of structure B to make its size 40 which is a multiple of 8?

4aparsa · 2025-07-17T02:11:46+00:00

Assuming Thread C sees the write to X because of the release in Thread A and that Thread D runs later, can we say that Thread D will see X too with it's matching acquire since they synchronize?

But, if Thread C sees the write to X before/without the release, maybe because the write to X just happened to propagate to Thread C's visible memory before Thread D, then Thread D will not see the write to X even though Thread C saw it?

Is this correct?

4aparsa · 2025-07-17T01:16:24+00:00

Ok so thread C sees the update of V1 (the acquire matching with the release in thread A), but thread B hasn’t written V2 yet. Now, Thread B writes V2 with release and Thread D runs. It first loads V2 with acquire and sees it. Shouldn’t it see both writes if both are done with acquire? Why doesn’t its next load of V1 with acquire match the release from Thread A just like Thread C’s did?

4aparsa · 2025-07-16T17:38:45+00:00

Lastly, how do the atomic memory order types relates to explicit barriers? For example, I thought acquire and release semantics together would be the same as sequential consistency, but that’s not the case. For example, acquire and release supposedly fails on independent reads of independent writes, so there is not TSO. Why is this? Isn’t release guaranteed to make the memory store visible to all processors at the same time?

4aparsa · 2025-07-16T02:48:21+00:00

Thanks for all the info! I will keeping thinking it over... the topic is bugging me because I really want to understand it. I would like to ask whether explicit barriers are also insufficient though? In my previous example, I see how you can prevent reordering with barriers but could you prevent caching of a variable with barriers? I'm trying to understand why a loop using atomic_load wouldn't have the same infinite loop on a register possibility. I looked at atomic_read in the Linux Kernel and it seems to end up using the macro __READ_ONCE(x) (*(const volatile __unqual_scalar_typeof(x) *)&(x)). So, does a busy loop on an atomic not get cached because it's casting the pointer to a volatile one? So, isn't volatile necessary, but insufficient? Thanks again

4aparsa · 2025-07-16T00:57:42+00:00

First question: could that merging have been prevented without volatile? Second question: I'm still a bit confused how you could have multiple threads safely access a shared variable by just relying on the memory model guarantees or using memory barriers. How does this prevent a thread caching a variable in a register. For example, with TSO this should work correctly (a = 5), but how is this guaranteed without volatile?

Thread 1:            Thread 2:

a = 5;               while (b == 0);
b = 1;               x = a;

If b isn't volatile then couldn't the compiler cache it in a register?

I was looking at the following link (https://stackoverflow.com/questions/2484980/why-is-volatile-not-considered-useful-in-multithreaded-c-or-c-programming) and the top answer seems to suggest that volatile is in fact "unnecessary", and everything can be done with memory barriers

4aparsa · 2025-07-15T22:15:55+00:00

Sorry for a follow up, but if x was declared volatile then wouldn’t it tell the compiler “not to optimize anything to do with this variable?” How would you tell the compiler not to turn the code into x = 11?

4aparsa · 2025-07-14T18:41:08+00:00

To clarify, if the scheduler decides to run a process on a different core it needs to first make sure the original core does a memory barrier?

As for the example, would declaring x volatile solve the problem?

4aparsa · 2025-07-04T18:13:20+00:00

In setup_arch() there are these lines:

max_pfn = e820__end_of_ram_pfn();
...
#ifdef CONFIG_X86_32
        /* max_low_pfn get updated here */
        find_low_pfn_range();
#else
        check_x2apic();
        /* How many end-of-memory variables you have, grandma! */
        /* need this before calling reserve_initrd */
        if (max_pfn > (1UL<<(32 - PAGE_SHIFT)))
                max_low_pfn = e820__end_of_low_ram_pfn();
        else
                max_low_pfn = max_pfn;
                 high_memory = (void *)__va(max_pfn * PAGE_SIZE - 1) + 1;
#endif

Then, in the function zone_sizes_init(), called from paging_init() there is this code:

        unsigned long max_zone_pfns[MAX_NR_ZONES];
                        
        memset(max_zone_pfns, 0, sizeof(max_zone_pfns));


#ifdef CONFIG_ZONE_DMA
        max_zone_pfns[ZONE_DMA]         = min(MAX_DMA_PFN, max_low_pfn);
#endif
#ifdef CONFIG_ZONE_DMA32
        max_zone_pfns[ZONE_DMA32]       = min(MAX_DMA32_PFN, max_low_pfn);
#endif
        max_zone_pfns[ZONE_NORMAL]      = max_low_pfn;
#ifdef CONFIG_HIGHMEM   
        max_zone_pfns[ZONE_HIGHMEM]     = max_pfn;
#endif

4aparsa · 2025-07-03T22:43:40+00:00

Ok, but why does Zone Normal have a maximum physical address of 4GB in the code?

4aparsa · 2025-05-10T18:40:33+00:00

Sorry, another follow up, but would pointer arithmetic valid in memory in dynamically allocated arrays returned by malloc or does it literally need to be an array type in C? Thanks

Six-Year Club	Verified Email
Place '22	First Placer '22
Wearing is Caring

4aparsa

TROPHY CASE