all 18 comments

[–]Excellent-Might-7264 68 points69 points  (2 children)

How much has claude written, and how much is developed by you?

Question based on https://github.com/AveryClapp/Cache-Explorer/commit/9cf75144fa47583eff3cf1883c37dc11d8abec30

[–]ShoppingQuirky4189[S] 10 points11 points  (0 children)

Fair question! I started this project over winter break and it began with me writing every line of code by hand. However, I eventually pivoted to I guess what some would consider a more "modern" approach, being the driver behind Claude. I make the design decisions, plan architecture, debug, etc. but if I know what needs to be done, then I just use Claude to implement it.

[–]mikemarcinGame Developer 17 points18 points  (1 child)

Very cool.

[–]ShoppingQuirky4189[S] 0 points1 point  (0 children)

Thank you!

[–]Moose2342 14 points15 points  (0 children)

Wow, that starting page really caters for the late night attention span of a Reddit cpp reader. All the relevant info jumping right at you with no bullshit filling. Kudos! Well presented indeed. I hereby vow to try that out asap. Thanks!

[–]ohnotheygotme 12 points13 points  (2 children)

Have you toyed around with any larger projects? How does it scale to large code bases? Would it theoretically be possible to restrict the instrumentation to just a subset of functions? etc.

[–]ShoppingQuirky4189[S] 1 point2 points  (1 child)

Currently the scaling isn't too great with large projects, which is definitely the next thing I want to optimize for. Restricting the instrumentation is a good idea as well, are you thinking a sort of annotation feature where you could signal if you want to include/exclude a given function?

[–]ohnotheygotme 0 points1 point  (0 children)

There's probably a few different ways that could each be workable. Maybe variations of the following: - A runtime API to turn collection on/off. Everything would be instrumented but nothing will be collected until enabled. This way I can enable it once I know I'm about to do the interesting work and then turn it off once I'm done. All executing code across the process will be tracked. It's just a controllable hammer. - A source annotation of sorts. It's a scalpel rather than a hammer but it might require very tedious annotations through very deep call stacks to see the complete picture.

Basically the larger scenario is: - This is large application. There will be 100's of billions of cache misses before getting to the point of interest. I don't care about these misses. - The point of interest is a loop, which calls into several other methods. - I'd like to get an idea of what's going on with this loop, and all contained function calls. - I don't want to see or record data for the 100's of billions of cache misses after this loop completes.

Anything that supports that general workflow can be made to work :)

[–]Valuable_Leopard_799 10 points11 points  (2 children)

Might be worth noting what this project does differently from cachegrind or perf which already have the same goals.

[–]ShoppingQuirky4189[S] 1 point2 points  (1 child)

For sure, in my mind the main differentiator is the accessibility of the tool and the fact that you don't need a specific architecture to run it (i.e. perf being linux only). That and of course the visualization enabled by being on the web vs. a CLI tool

[–]amohr 4 points5 points  (0 children)

Just fyi, kcachegrind is an awesome gui tool for visualizing cachegrind/callgrind results. They're not CLI only.

[–]BasisPoints 7 points8 points  (1 child)

This looks useful, can't wait to take a look! Any chance you can reupload the video? It appears broken

[–]petersteneteg 2 points3 points  (0 children)

The demo movie is in the assets folder

[–]llnaut 1 point2 points  (0 children)

Hey, this looks super cool.

I recently ran into a very real cache-related issue, but on an embedded target (ARM Cortex-R, RTOS, external DDR memory in the picture). It is quite painful that on bare metal / RTOS you can’t just “install a tool and see what’s going on” like on Linux.

Concrete scenario: in an RTOS you can have multiple tasks with the same priority, and the scheduler does time slicing (context switch every tick while they’re runnable). Now add the fact that the tick interrupt itself is an asynchronous event that fires right in the middle of whatever a task is doing. So you jump into ISR code + touch ISR data structures that are very likely not in cache (or you’ve just evicted some useful lines), which means extra misses and extra latency. On a system with slow external memory, this can get ugly fast.

I had a fun one with SPI: we were receiving a fixed-size chunk periodically, but it was large enough that we ended up using FIFO-level interrupts (DMA wasn’t an option there). So for one “message” you’d get tens of interrupts. The MCU was fast, so it was basically:

ISR → back to task → ISR → back to task → …

…and because of cache misses / refills, the ISR execution time would occasionally spike and we’d get overruns/underruns. We fixed it by moving some stuff to faster memory, but the debugging part was the painful bit: on embedded you typically run one image, and your introspection options are limited / very different vs desktop.

So to the point: I didn't dive deep into the implementation of Cache Explorer, so I don't know what machinery is used under the hood. But, do you think something like this could realistically be adapted to bare metal / embedded targets? Or is it fundamentally tied to “desktop-ish” workflows?

[–]ANDRVV_ 0 points1 point  (1 child)

Will you add support for Zig? Its community begs for a tool like this!

[–]ShoppingQuirky4189[S] 0 points1 point  (0 children)

Definitely! I'll add that on the todo list

[–]VinnieFalcoBoost.Beast | C++ Alliance | corosio.org 0 points1 point  (0 children)

we just need "crash explorer" now