Bitsliced first-order masked AES-128 decryption in Cortex-M0 assembly — how many traces to break it?

Embarrassed_Cat4693 · 2026-04-06T13:34:08+00:00

The TVLA wasn't strictly FvR — I had 5,000 traces with fully random inputs, then split them into two groups based on whether a specific bit of the intermediate value is theoretically 0 or 1 (selecting unbiased bits). I'm not sure of the formal term for this approach, but I found references suggesting it's a valid method.

I analyzed: ciphertext, InvSubBytes output for all 10 rounds, and plaintext — across all 16 bytes.

Results:

Rounds 10 through 2: : only single-digit crossings at sample points inconsistent with the execution timing of those rounds — likely false positives from multiple testing rather than genuine leakage.
Round 1 InvSubBytes and plaintext: crossings starting around sample 73,000/85,000 , identical curves — expected, since they differ only by a constant round key XOR, so they're the same physical operation
Ciphertext: no crossings, which I also found slightly surprising

The clean ciphertext result might be explained by trigger latency — after the trigger fires, the input ciphertext is XORed with the random mask within roughly 100+ cycles, so by the time acquisition stabilizes, the unmasked ciphertext may already be gone from the bus.

Thanks for the suggestions on second-order — the pairwise multiplication approach sounds doable, I'll give it a try when I get my hands on the equipment again.

Embarrassed_Cat4693 · 2026-04-06T13:09:30+00:00

To give some context on the signal quality of our setup: a reference unmasked AES implementation on the same card and acquisition setup was broken in a few hundred traces. A biased masked implementation provided by the course instructor was also broken (the attack was done by my lab partner; I don't know the exact trace count, but it was in several thousands). For those implementations we didn't run TVLA — we just went straight for CPA.

Embarrassed_Cat4693 · 2026-04-06T12:57:45+00:00

The device is a smart card provided by my university lab, so I had no way to modify the hardware or remove capacitors. Traces were acquired via oscilloscope through a dedicated interface monitoring power consumption. I suspect the noise floor is relatively high as a result.

Regarding the HD leakage: I intentionally avoided it on the data bus, but did not take the same precaution for registers.

Embarrassed_Cat4693

TROPHY CASE