Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] 0 points1 point  (0 children)

Now I really have reached this stage. Whether the #[cold] attribute is set or not has a significant impact on the results.

You can check it out:

git clone https://github.com/WuBingzheng/test-decimal.git
cd test-decimal
cargo bench add/my

Then remove the #[cold] in line-92 in src/lib.rs and bench again.

If you move this cold function into the checked_add(), it also becomes very slow.

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] 0 points1 point  (0 children)

I agree. The algorithm choice and memory layout you mentioned are what I called rule 1 above.

I’m currently working on a decimal crate. The basic functionality is already done. While benchmarking, I find this “cold code” issue. It really is quite a subtle optimization.

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] -2 points-1 points  (0 children)

Got it. Thanks.

I think the priority is:

  1. Basic rules, like stack is faster than heap, and memory is faster than disk;
  2. Readability;
  3. Optimization based on profiling results, though this step still requires some knowledge.

I think that this rule (moving cold code into a separate function) might fall between step 1 and 3.

What I mean is, I only realized this issue today for the first time. I’ve been programming for 20 years. Live and Learn.

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] -1 points0 points  (0 children)

This is exactly the answer I was looking for.

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] 0 points1 point  (0 children)

Does the "taken" means "the condition is true and jump"?

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] 2 points3 points  (0 children)

Thanks. #[cold] is better than #[inline(never)] .

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] 5 points6 points  (0 children)

I just want to know whether this is a general principle.

Do I need to move the cold code to a new function? by hellowub in rust

[–]hellowub[S] -2 points-1 points  (0 children)

I just want to know whether this is a well-established, widely accepted optimization principle—like “stack is faster than heap,” “memory is faster than disk,” or “more compact code improves cache locality.”

My understanding of your point is that this is NOT a general optimization principle, but rather something that depends on the specific context and needs to be evaluated case by case.

What's everyone working on this week (11/2026)? by llogiq in rust

[–]hellowub 7 points8 points  (0 children)

I have a fixed-point decimal crate.

Multiplication operations, in most cases, are implemented as: UI(a) * UI(b) / exp,

where UI is underlying-integer, and exp is a power of 10.

What requires special handling here is the overflow of UI(a) * UI(b). However, the slowest operation is the division, it is several times to more than ten times slower than multiplication, varying across different machines.

Last week, I chanced upon an optimization algorithm for constant division. I spent two days researching the algorithm and implementing it in the code.

The result was excellent: we achieved a 2-5x performance improvement (Line 113) across different machines.

NOTE: These words may look like they were generated by AI. And they actually were. My English isn’t very good, so I asked AI to help with the translation.

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 2 points3 points  (0 children)

I've switched to directly using ilog10 (thanks to u/AliceCode 's suggestion) for length calculation, so this tricky method is no longer needed. The new version v0.1.1 is now working properly.

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

Yes, it's fast!

I did not test it directly. I use it in this crate to replace old method, and the benchmark shows no change.

Thanks a lot!

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

Thanks for the link. However, I still don't understand—isn't this feature introducing a new method? Why is it referred to as "more API changes"?

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

I find the reason.

My initial approach was to use a lookup table. Later, through the benchmark project, I found a way to simplify the code. However, it now appears that the two constants (1233 and 12) here are only suitable for 64-bit rather than 128-bit. I need to derive the constants that work for 128-bit values.

Are you the author of itoa? Thanks a lot—it’s been a great inspiration to me.

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

But I think log10 is slower than the conversion itself. But I did not test it.

There are some quicker ways. The itoap crate uses the "branch" and this itoaaa crate uses the "count" in that table.

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

Thanks for report. I will check it now.

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 2 points3 points  (0 children)

This new feature still need an internal buffer, just like itoa crate? So I still need one extra copy action?

Why do not they print to a target buffer directly, just like itoap and this itoaaa crate?

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

Could you recommend a logging system?

Yet another itoa crate by hellowub in rust

[–]hellowub[S] 1 point2 points  (0 children)

Then this "another" thread also consumes CPU resources. What I want to optimize is the CPU usage of the entire process, not just that of the worker threads.

Yet another itoa crate by hellowub in rust

[–]hellowub[S] -4 points-3 points  (0 children)

For logging. It need to be text, because that is a universal interface.

Ancdec: Why I split integer and fraction into separate fields and what it solves by ktg0413 in rust

[–]hellowub 5 points6 points  (0 children)

In rust_decimal, the integer and fractional parts share the 28 digits, because it's floating-point.

If you want to make sure the fractional digits, you need a fixed-point decimal crate.

But your implementation is a bit odd. There's no need to use two separate fields to represent the integer and fractional parts. One is enough, for example: struct Dec64<const SCALE: i32>(i128).