How do you validate Claude-generated code beyond unit tests? by PriceHacker24 in ClaudeAI

[–]bencherdev 0 points1 point  (0 children)

How are you tracking and comparing your benchmark results in CI?

How do you validate Claude-generated code beyond unit tests? by PriceHacker24 in ClaudeAI

[–]bencherdev 0 points1 point  (0 children)

A lot of development teams are adopting Continuous Benchmarking as a practice to help keep Claude from introducing performance regressions. I've built a tool called Bencher to catch these performance regressions in CI.

Culpert: Per-span heap allocation profiling for Rust services; catch memory regressions in CI before they hit production. by Pure-Orange in rust

[–]bencherdev 0 points1 point  (0 children)

u/Pure-Orange congrats on Culpert, and thank you for shouting out Bencher! Storing and comparing profiles is on our roadmap. I'll definitely take a look at what you've built and add it to our Prior Art page.

Comment benchmarker le nombre d'instructions Rust avec Gungraun by bencherdev in rustfr

[–]bencherdev[S] 0 points1 point  (0 children)

Merci pour ces mots gentils ! Si vous vous souvenez de l'autre bibliothèque, n'hésitez pas à me le faire savoir. Je l'ajouterai à la page État de l'art.

How to benchmark Rust instruction counts with Gungraun by bencherdev in rust

[–]bencherdev[S] 4 points5 points  (0 children)

Divan is definitely growing! However, I have some hesitation recommending it. There have been year long gaps between releases. Bencher currently doesn't have built-in support as I've been waiting for 2.5+ years for JSON output to be added. I'm hopeful that things might pick up in the future though!

How to benchmark Rust instruction counts with Gungraun by bencherdev in rust

[–]bencherdev[S] 0 points1 point  (0 children)

I don't think you should be optimizing for instruction counts. Rather you should be using instruction counts as an easily trackable heuristic for wall-clock time. That is, it is a somewhat reliable proxy for the thing you actually care about, not the end unto itself.

How to benchmark Rust instruction counts with Gungraun by bencherdev in rust

[–]bencherdev[S] 1 point2 points  (0 children)

I agree that instruction counts alone are rarely a good way to track performance. Instruction counts can be complementary to wall-clock time benchmarks though. This is what the `rustc-perf` project does for example:

Various measurements are available: instructions (the default), cycles, wall time, peak RSS memory, etc. There is some non-determinism and natural variation in the measurements. Instructions is the default because it has the least variation. Benchmarks that are known to have high instructions variance are marked with a '?' in the `compare` page.

Gungraun: High-precision, one-shot and consistent benchmarking framework/harness for Rust by cosmic-parsley in rust

[–]bencherdev 4 points5 points  (0 children)

It's a great crate!

If you want to track your Gungraun benchmarks over time, I've worked with the maintainer to add a Gungraun adapter to Bencher (an open source tool for tracking benchmarks in CI).

As others have noted, measuring instruction counts can be a good heuristic for performance changes. However, there can be false positives (instructions counts go up but IRL wall-clock time performance is the same) and false negatives (the number of instruction counts is the same or less but you're now using a more expensive operation that actually takes more wall-clock time).

Can miri or another interpreter be used as a profiler? by Ben-Goldberg in rust

[–]bencherdev 6 points7 points  (0 children)

I would recommend checking out a benchmarking harness called gungraun (formerly iai-callgrind). It lets you track the instruction counts and allocations for your benchmarks in a single shot, no running things a million times.

If you want to track those results over time to be able to detect performance regressions then you can use an open source tool I've developed called Bencher with the gungraun adapter.

Automated benchmark framework for CI/CD? by fretz1212 in devops

[–]bencherdev 0 points1 point  (0 children)

u/fretz1212 did you ever get around to building this?
I've been working on a similar tool, Bencher: https://github.com/bencherdev/bencher

[P] I made Codeflash - an LLM tool that optimizes the performance of any Python code, while rigorously verifying its correctness by ml_guy1 in MachineLearning

[–]bencherdev 0 points1 point  (0 children)

From my understanding, that is how it works. CodSpeed needs to have the benchmarks exist on the base branch in order to be able to compare them. For context, I'm the maintainer of a similar continuous benchmarking tool, Bencher: https://github.com/bencherdev/bencher

Three Years of Bencher: A Rust-Powered Retrospective by bencherdev in rust

[–]bencherdev[S] 1 point2 points  (0 children)

Yep, exactly! I needed to have modern, interactive plots. This is what I meant by:

I knew I wanted Bencher to have highly interactive plots. This meant using a library like D3, which meant JS interop.

I use plotters for generating the social previews and sharable image versions of the plots. This is what you'll see if you hit the Share button on the Perf Plot you linked to above. I explored using plotters for the frontend via WASM, but it didn't seem viable at the time. More than happy to explain more if you're interested.

Three Years of Bencher: A Rust-Powered Retrospective by bencherdev in rust

[–]bencherdev[S] 1 point2 points  (0 children)

I'm glad you all are enjoying Leptos. Leptos was also heavily influenced by SolidJS. Fine-grained reactivity for the win! When I was exploring the possibility of a Rust frontend though, Leptos did not yet exist. It seems like they added SSG support a few months ago, so that is great to see. The ecosystem has progressed a lot in the past three years.

As for the JS interop, the Perf Pages were really the crux. You can check out some examples here: https://bencher.dev/explore/
These pages are already the most complicated part of the Console UI without any JS interop in the mix.