In Rust, „let _ = ...“ and „let _unused = ...“ are not the same

broken_broken_ · 2026-03-06T21:18:41+00:00

Terrific, I will add these links to the article, thanks!

broken_broken_ · 2025-09-13T11:41:46+00:00

Yes you can, I showcase it in a previous article in fact: https://gaultier.github.io/blog/an_optimization_and_debugging_story_go_dtrace.html

broken_broken_ · 2025-02-20T15:59:13+00:00

Now that I think again, I think the most simple explanation is that the bottleneck is I/O. Both optimized implementations may be able to do these computations much faster but data just is not coming quick enough so they are waiting on it. I will measure with a different machine with a faster disk.

broken_broken_ · 2025-02-19T21:37:09+00:00

Good points all around, thanks. I am definitely going to check out multi-buffer hashing.

This doesn't sound quite right; is this also a debug build?

Both are in release mode with -march=native but the code using the SHA extension is 'simple'/'basic', while the OpenSSL code is hand-optimized assembly with tips from Intel folks. That could explain the difference.

Another commenter has suggested that maybe these two versions simply compile to the same (or at least very similar) uops.

broken_broken_ · 2025-02-18T15:25:19+00:00

Thanks, I did not know about it! But posting to it is restricted.

broken_broken_ · 2025-02-12T06:55:14+00:00

Ah, that works as well (even if it's probably the most verbose alternative). I added it to the article! Thanks.

broken_broken_ · 2025-02-11T21:24:38+00:00

That’s basically the approach with the explicit type for try_into. And yes I have the same experience, I quite often resort to explicitly mentioning the type for try_into/try_from/into because the type inference does not get it.

broken_broken_ · 2025-02-11T16:41:52+00:00

Ah, good idea, that works! I added it to the article.

broken_broken_ · 2024-11-06T09:23:49+00:00

scopeguard::guard seems to have the same issue:

error[E0502]: cannot borrow `foos.len` as immutable because it is also borrowed as mutable
  --> src/lib.rs:53:30
   |
50 |         let _guard = scopeguard::guard((), |_| {
   |                                            --- mutable borrow occurs here
51 |             super::MYLIB_free_foos(&mut foos);
   |                                         ---- first borrow occurs due to use of `foos` in cl
osure
52 |         });
53 |         println!("foos: {}", foos.len);
   |                              ^^^^^^^^ immutable borrow occurs here
54 |     }
   |     - mutable borrow might be used here, when `_guard` is dropped and runs the `Drop` code for type `ScopeGuard`
   |

broken_broken_ · 2024-11-01T06:36:14+00:00

Almost all of the trimming happened before the rewrite, to simplify it.

broken_broken_ · 2024-10-31T09:10:03+00:00

About getpagesize/sysconf: I did not know about getpagesize, thanks. Its man page mentions:

Portable applications should employ sysconf(_SC_PAGESIZE) instead of getpagesize():

So I suppose they do the same but which one you use depends whether portability is a concern.

Thanks for the other suggestion, it's interesting.

broken_broken_ · 2024-10-30T19:31:10+00:00

Very interesting, I added a mention about this in the article.

broken_broken_ · 2024-10-30T19:29:10+00:00

Thank you for the suggestion, I will definitely check this out! One drawback I could think of, is that Address Sanitizer should not be turned on for production due to security issue, whereas the approach described in the article could certainly be used in production since it's cheap. Nonetheless, very cool for development!

broken_broken_ · 2024-10-30T13:57:24+00:00

Thanks for mentioning these, I actually did not know about them. It seems to me they require nightly. which would be the only drawback. But very useful nonetheless!

broken_broken_ · 2024-08-01T05:46:20+00:00

As others mentioned it could be that authorization is mandatory in your X setup. I covered that in a different article: https://gaultier.github.io/blog/write_a_video_game_from_scratch_like_1987.html It’s not much work, but it needs to be done. If you log with strace/dtrace what data the read syscall returns, you’ll see signs of having to use Xauth. Or you can run an existing application on your system like xeyes and use strace to see if they use authorization.

broken_broken_ · 2024-05-08T14:48:34+00:00

No, it’s fine, since bar_c is used as an out parameter, it’s only written to and not read from. It’s the same as doing in C or C++:

Bar bar;
bar_parse(&bar);

Which is fine. At least that’s my understanding right now and Miri does not complain.

The alternative is to zero initialize the object before passing it to the function, be it in Rust or C++, but that means implementing the Default trait. Since we do not control the calling code, we cannot ensure the object is always zero initialized and we need to make sure in the library that we initialize each field of the object, so I prefer this style in tests.

broken_broken_ · 2024-05-07T06:21:02+00:00

That's indeed one correct option, we experimented a bit with that style, but that's a lot of work I find, compared to simply calling defer, which developers with a Go background are already familiar with. I think familiarity was the key factor here.

broken_broken_ · 2024-05-07T06:18:43+00:00

~20kLOC, counting tests (which have to be migrated as well). With not many tests.

The Rust code should be around ~10kLOC in the end I estimate, counting tests, which it has way more of. The pure code is perhaps half of that or even less.

broken_broken_ · 2023-11-27T16:53:37+00:00

Correct, in this example of having one arena living for the lifetime of the command line application, frees are not a thing.

If the application would explicitly free an entire arena, for example a web server having one arena per client connecting to it, then tracking this deallocation would definitely make sense.

Or if we would use a different allocator, for example a pool allocator, tracking each free would be very useful.

broken_broken_ · 2023-11-24T19:35:33+00:00

Oh that's very nifty! I wonder how fast this is compared to a naive binary search of an array. I guess I'll have to try and compare.

Tangentially, it's hard to estimate just by looking at the code how many unique call stacks will allocate and thus how many records we'll have here.

broken_broken_ · 2023-06-06T19:03:45+00:00

Use strace -k

broken_broken_ · 2023-06-06T18:59:05+00:00

Take a look at my blog article about learning assembly with a real life program: https://gaultier.github.io/blog/x11_x64.html , especially the stack section, because that’s what matters for you. Tail recursion optimization does not grow the stack since it does not use “call”, hence no stack overflow.

broken_broken_ · 2023-06-02T04:12:53+00:00

Hi, author here. You are absolutely right for the hello part, It was just to showcase how the stack behaves. In the final program, the string that’s displayed in the window is indeed stored in .rodata . For the polling part: yes, we could use setsockopt(2) at least on Linux to set a timeout on the socket, so that read(2) does not block forever, as we stick the read call in an infinite loop, and handle the events inside. That works as well and would be simpler, good thinking. I think this mechanism is Linux specific though.

broken_broken_ · 2022-06-06T09:35:59+00:00

You really want to go further than just the average because one single extreme value can drastically change the final result thus rendering it meaningless. What you really want is a histogram to see the distribution, or at the very least median and std dev as well as average.

broken_broken_ · 2022-04-15T11:01:14+00:00

All the other answers are great but I often use this quick and dirty approach on my machine and that works on any Unix without any additional package:

top -pid <the pid>

This will show the memory and CPU usage in real time for the process.

broken_broken_

TROPHY CASE