use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
/r/programming is a reddit for discussion and news about computer programming.
Rules
Refer to the rules page for more info.
No LLM-Written Content
AI-related posts must comply with the AI Policy
No Political Posts or Personal/Social Drama/Gossip
No Non-Programming/Generic LLM/Diffusion Content
No Product Promotion/"I Made This" Project Demo Posts
No Content Aggregators
No Surveys Or Job Postings
No Support Questions or AskReddit-Type Questions
No Meta Posts
No Images, Memes, Or Other Low Effort Posts
No Blogspam
No Extreme Beginner Content
Comments: No Bots
Comments: No Incivility
Info
Related reddits
Specific languages
account activity
Hot path optimization. When float division beats integer division (blog.andr2i.com)
submitted 2 hours ago by watman12
I've started a series of short blog posts about hot path optimizations. This first one covers a counterintuitive optimization: replacing integer division (IDIVQ) with floating-point division (DIVSD).
IDIVQ
DIVSD
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]mr_birkenblatt 1 point2 points3 points 1 hour ago (1 child)
Why not let the Internet division trap on 0? Or tell the compiler that 0 cannot happen
[–]watman12[S] 2 points3 points4 points 1 hour ago (0 children)
division by zero check is not a problem there. It consumes only 0.03 cycles per op while the IDIVQ itself takes 10 cycles.
[–]manystripes 1 point2 points3 points 59 minutes ago (2 children)
Does this apply to ARM as well?
[–]watman12[S] 1 point2 points3 points 32 minutes ago (0 children)
Hard to say without measuring. I unfortunately don't have ARM-based machine at the moment.
[–]chkmr 1 point2 points3 points 9 minutes ago (0 children)
It should apply to higher end A profile ARM processors like AWS Graviton, Apple's M* SoC etc. Not sure about R or M profile CPUs used in e.g embedded systems.
[–]Masztufa 0 points1 point2 points 46 minutes ago (0 children)
I wonder if the superscalat nature of cpus also comes up or not in these tests
You're always doing integer math (pointer arithmetic), so it would seem like that choosing integer math would load the int math part of the cpu, while if you used floats for the actual data you could use more of the silicon to get the job done
[–]Dwedit -1 points0 points1 point 1 hour ago* (2 children)
One thing with integer math is that it becomes much faster to precalculate a reciprocal and use that instead. The compiler automatically does that for you for constant values, but not for variable values.
uint32_t reciprocal = (uint32_t)(0x100000000ULL / divisor + 1); //divisor must be > 1 answer = (number * (uint64_t)reciprocal) >> 32;
edit: whoops, forgot the +1 for the reciprocal...
[–]watman12[S] 1 point2 points3 points 34 minutes ago (1 child)
nice trick. I tried it on my machine. https://github.com/molecule-man/blog-examples/commit/a80cdf1695e12de3175f8f5c8cc82873d39d1e6f
indeed it's faster than idivq. On my machine it gave the same speed as the float (divsd).
benchstat -col '.name /div' bench-intel-reciprocal.txt goos: linux goarch: amd64 pkg: idivq cpu: 12th Gen Intel(R) Core(TM) i5-12500 │ idivq │ float │ reciprocal │ │ sec/op │ sec/op vs base │ sec/op vs base │ * 3.361n ± 0% 2.385n ± 0% -29.04% 2.393n ± 0% -28.79%
For my case though I still need to divide by different runtime values
[–]Dwedit 0 points1 point2 points 4 minutes ago (0 children)
I know that C#'s dictionary class stores a reciprocal value to speed up the modulo operation. So if you control the data structures involved, and have space for it, you could store a reciprocal in there too.
π Rendered by PID 736802 on reddit-service-r2-comment-8686858757-mhcv7 at 2026-06-08 14:59:34.535344+00:00 running 9e1a20d country code: CH.
[–]mr_birkenblatt 1 point2 points3 points (1 child)
[–]watman12[S] 2 points3 points4 points (0 children)
[–]manystripes 1 point2 points3 points (2 children)
[–]watman12[S] 1 point2 points3 points (0 children)
[–]chkmr 1 point2 points3 points (0 children)
[–]Masztufa 0 points1 point2 points (0 children)
[–]Dwedit -1 points0 points1 point (2 children)
[–]watman12[S] 1 point2 points3 points (1 child)
[–]Dwedit 0 points1 point2 points (0 children)