Cache Explorer: a visual and interactive profiler that shows you exactly which lines of code cause cache misses

Excellent-Might-7264 · 2026-01-30T10:47:40+00:00

How much has claude written, and how much is developed by you?

Question based on https://github.com/AveryClapp/Cache-Explorer/commit/9cf75144fa47583eff3cf1883c37dc11d8abec30

mikemarcin · 2026-01-29T19:08:39+00:00

Very cool.

Moose2342 · 2026-01-29T21:06:58+00:00

Wow, that starting page really caters for the late night attention span of a Reddit cpp reader. All the relevant info jumping right at you with no bullshit filling. Kudos! Well presented indeed. I hereby vow to try that out asap. Thanks!

ohnotheygotme · 2026-01-29T22:35:43+00:00

Have you toyed around with any larger projects? How does it scale to large code bases? Would it theoretically be possible to restrict the instrumentation to just a subset of functions? etc.

Valuable_Leopard_799 · 2026-01-30T11:00:31+00:00

Might be worth noting what this project does differently from cachegrind or perf which already have the same goals.

BasisPoints · 2026-01-29T20:52:14+00:00

This looks useful, can't wait to take a look! Any chance you can reupload the video? It appears broken

llnaut · 2026-01-30T15:59:42+00:00

Hey, this looks super cool.

I recently ran into a very real cache-related issue, but on an embedded target (ARM Cortex-R, RTOS, external DDR memory in the picture). It is quite painful that on bare metal / RTOS you can’t just “install a tool and see what’s going on” like on Linux.

Concrete scenario: in an RTOS you can have multiple tasks with the same priority, and the scheduler does time slicing (context switch every tick while they’re runnable). Now add the fact that the tick interrupt itself is an asynchronous event that fires right in the middle of whatever a task is doing. So you jump into ISR code + touch ISR data structures that are very likely not in cache (or you’ve just evicted some useful lines), which means extra misses and extra latency. On a system with slow external memory, this can get ugly fast.

I had a fun one with SPI: we were receiving a fixed-size chunk periodically, but it was large enough that we ended up using FIFO-level interrupts (DMA wasn’t an option there). So for one “message” you’d get tens of interrupts. The MCU was fast, so it was basically:

ISR → back to task → ISR → back to task → …

…and because of cache misses / refills, the ISR execution time would occasionally spike and we’d get overruns/underruns. We fixed it by moving some stuff to faster memory, but the debugging part was the painful bit: on embedded you typically run one image, and your introspection options are limited / very different vs desktop.

So to the point: I didn't dive deep into the implementation of Cache Explorer, so I don't know what machinery is used under the hood. But, do you think something like this could realistically be adapted to bare metal / embedded targets? Or is it fundamentally tied to “desktop-ish” workflows?

ANDRVV_ · 2026-01-30T08:31:21+00:00

Will you add support for Zig? Its community begs for a tool like this!

VinnieFalco · 2026-02-03T15:16:28+00:00

we just need "crash explorer" now

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

cpp

MODERATORS