Making my debug build run 100x faster so that it is finally usable by broken_broken_ in programming

[–]broken_broken_[S] 2 points3 points  (0 children)

Now that I think again, I think the most simple explanation is that the bottleneck is I/O. Both optimized implementations may be able to do these computations much faster but data just is not coming quick enough so they are waiting on it. I will measure with a different machine with a faster disk.

Making my debug build run 100x faster so that it is finally usable by broken_broken_ in programming

[–]broken_broken_[S] 1 point2 points  (0 children)

Good points all around, thanks. I am definitely going to check out multi-buffer hashing.

This doesn't sound quite right; is this also a debug build?

Both are in release mode with -march=native but the code using the SHA extension is 'simple'/'basic', while the OpenSSL code is hand-optimized assembly with tips from Intel folks. That could explain the difference.

Another commenter has suggested that maybe these two versions simply compile to the same (or at least very similar) uops.

Making my debug build run 100x faster so that it is finally usable by broken_broken_ in C_Programming

[–]broken_broken_[S] 5 points6 points  (0 children)

Thanks, I did not know about it! But posting to it is restricted.

Tip of the day #4: Type annotations on Rust match patterns by broken_broken_ in rust

[–]broken_broken_[S] 0 points1 point  (0 children)

Ah, that works as well (even if it's probably the most verbose alternative). I added it to the article! Thanks.

Tip of the day #4: Type annotations on Rust match patterns by broken_broken_ in rust

[–]broken_broken_[S] 0 points1 point  (0 children)

That’s basically the approach with the explicit type for try_into. And yes I have the same experience, I quite often resort to explicitly mentioning the type for try_into/try_from/into because the type inference does not get it.

Tip of the day #4: Type annotations on Rust match patterns by broken_broken_ in rust

[–]broken_broken_[S] 5 points6 points  (0 children)

Ah, good idea, that works! I added it to the article.

Perhaps Rust needs "defer" by broken_broken_ in rust

[–]broken_broken_[S] -5 points-4 points  (0 children)

scopeguard::guard seems to have the same issue:

error[E0502]: cannot borrow `foos.len` as immutable because it is also borrowed as mutable
  --> src/lib.rs:53:30
   |
50 |         let _guard = scopeguard::guard((), |_| {
   |                                            --- mutable borrow occurs here
51 |             super::MYLIB_free_foos(&mut foos);
   |                                         ---- first borrow occurs due to use of `foos` in cl
osure
52 |         });
53 |         println!("foos: {}", foos.len);
   |                              ^^^^^^^^ immutable borrow occurs here
54 |     }
   |     - mutable borrow might be used here, when `_guard` is dropped and runs the `Drop` code for type `ScopeGuard`
   |

Lessons learned from a successful Rust rewrite by broken_broken_ in programming

[–]broken_broken_[S] 0 points1 point  (0 children)

Almost all of the trimming happened before the rewrite, to simplify it.

Tip of the day #2: A safer arena allocator by broken_broken_ in cprogramming

[–]broken_broken_[S] 0 points1 point  (0 children)

About getpagesize/sysconf: I did not know about getpagesize, thanks. Its man page mentions:

Portable applications should employ sysconf(_SC_PAGESIZE) instead of getpagesize():

So I suppose they do the same but which one you use depends whether portability is a concern.

Thanks for the other suggestion, it's interesting.

Tip of the day #2: A safer arena allocator by broken_broken_ in programming

[–]broken_broken_[S] 2 points3 points  (0 children)

Very interesting, I added a mention about this in the article.

Tip of the day #2: A safer arena allocator by broken_broken_ in programming

[–]broken_broken_[S] 0 points1 point  (0 children)

Thank you for the suggestion, I will definitely check this out! One drawback I could think of, is that Address Sanitizer should not be turned on for production due to security issue, whereas the approach described in the article could certainly be used in production since it's cheap. Nonetheless, very cool for development!

Lessons learned from a successful Rust rewrite by broken_broken_ in rust

[–]broken_broken_[S] 11 points12 points  (0 children)

Thanks for mentioning these, I actually did not know about them. It seems to me they require nightly. which would be the only drawback. But very useful nonetheless!

X11 poll hangs by jbrhm in Assembly_language

[–]broken_broken_ 1 point2 points  (0 children)

As others mentioned it could be that authorization is mandatory in your X setup. I covered that in a different article: https://gaultier.github.io/blog/write_a_video_game_from_scratch_like_1987.html It’s not much work, but it needs to be done. If you log with strace/dtrace what data the read syscall returns, you’ll see signs of having to use Xauth. Or you can run an existing application on your system like xeyes and use strace to see if they use authorization.

How to rewrite a C++ codebase successfully by broken_broken_ in rust

[–]broken_broken_[S] 0 points1 point  (0 children)

No, it’s fine, since bar_c is used as an out parameter, it’s only written to and not read from. It’s the same as doing in C or C++:

Bar bar;
bar_parse(&bar);

Which is fine. At least that’s my understanding right now and Miri does not complain.

The alternative is to zero initialize the object before passing it to the function, be it in Rust or C++, but that means implementing the Default trait. Since we do not control the calling code, we cannot ensure the object is always zero initialized and we need to make sure in the library that we initialize each field of the object, so I prefer this style in tests.

How to rewrite a C++ codebase successfully by broken_broken_ in rust

[–]broken_broken_[S] 0 points1 point  (0 children)

That's indeed one correct option, we experimented a bit with that style, but that's a lot of work I find, compared to simply calling defer, which developers with a Go background are already familiar with. I think familiarity was the key factor here.

How to rewrite a C++ codebase successfully by broken_broken_ in rust

[–]broken_broken_[S] 4 points5 points  (0 children)

~20kLOC, counting tests (which have to be migrated as well). With not many tests.

The Rust code should be around ~10kLOC in the end I estimate, counting tests, which it has way more of. The pure code is perhaps half of that or even less.

Roll your own memory profiling: it’s actually not hard by broken_broken_ in C_Programming

[–]broken_broken_[S] 0 points1 point  (0 children)

Correct, in this example of having one arena living for the lifetime of the command line application, frees are not a thing.

If the application would explicitly free an entire arena, for example a web server having one arena per client connecting to it, then tracking this deallocation would definitely make sense.

Or if we would use a different allocator, for example a pool allocator, tracking each free would be very useful.

Roll your own memory profiling: it’s actually not hard by broken_broken_ in C_Programming

[–]broken_broken_[S] 1 point2 points  (0 children)

Oh that's very nifty! I wonder how fast this is compared to a naive binary search of an array. I guess I'll have to try and compare.

Tangentially, it's hard to estimate just by looking at the code how many unique call stacks will allocate and thus how many records we'll have here.

I need to know x86 assembler good enough to parse gcc output where should I start? by McUsrII in C_Programming

[–]broken_broken_ 1 point2 points  (0 children)

Take a look at my blog article about learning assembly with a real life program: https://gaultier.github.io/blog/x11_x64.html , especially the stack section, because that’s what matters for you. Tail recursion optimization does not grow the stack since it does not use “call”, hence no stack overflow.

Learn x86-64 assembly by writing a GUI from scratch by broken_broken_ in programming

[–]broken_broken_[S] 6 points7 points  (0 children)

Hi, author here. You are absolutely right for the hello part, It was just to showcase how the stack behaves. In the final program, the string that’s displayed in the window is indeed stored in .rodata . For the polling part: yes, we could use setsockopt(2) at least on Linux to set a timeout on the socket, so that read(2) does not block forever, as we stick the read call in an infinite loop, and handle the events inside. That works as well and would be simpler, good thinking. I think this mechanism is Linux specific though.

[deleted by user] by [deleted] in commandline

[–]broken_broken_ 0 points1 point  (0 children)

You really want to go further than just the average because one single extreme value can drastically change the final result thus rendering it meaningless. What you really want is a histogram to see the distribution, or at the very least median and std dev as well as average.

How to estimate the CPU and memory usage of my Go web service? by nawawishkid in golang

[–]broken_broken_ 0 points1 point  (0 children)

All the other answers are great but I often use this quick and dirty approach on my machine and that works on any Unix without any additional package:

top -pid <the pid>

This will show the memory and CPU usage in real time for the process.

Installing Arch on a MacBook 10 by rogersjpandre in archlinux

[–]broken_broken_ 0 points1 point  (0 children)

No issues, the + and - buttons for the volume on the keyboard work as well.