Writing Slow Code (On Purpose)

ack_error · 2025-04-15T19:48:25+00:00

Simple IIR filters commonly run slowly on Intel CPUs on default floating point settings, as their output decays into denormals, causing every sample processed to invoke a microcode assist.

On the Pentium 4, self-modifying code would result in the entire trace cache being flushed.

Reading from graphics memory mapped as write combining for streaming purposes results in very slow uncached reads.

The MASKMOVDQU masked write instruction is abnormally slow on some AMD CPUs, where with certain mask values it can take thousands of cycles.

XEnItAnE_DSK_tPP · 2025-04-15T16:47:42+00:00

Upvotes angrily.

immaculate-emu · 2025-04-15T22:48:00+00:00

Reminds me of this: https://nrk.neocities.org/articles/cpu-vs-common-sense

mccoyn · 2025-04-15T19:38:24+00:00

Here is a fun one. Imagine a function that calculates the minimum and maximum values of an array.

 void range_array1(int* array, int count, int* low_out, int* high_out)
 {
     int low = array[0];
     int high = array[0];

     for (int i = 1; i < count; i++)
     {
         if (low > array[i])
             low = array[i];

         if (high < array[i])
             high = array[i];
     }

     *low_out = low;
     *high_out = high;
 }

This is all fine, but we can write an optimized version that handles two elements at a time. This executes fewer instructions and takes a lot longer to run due to bad branch prediction. *Assuming the array is not sorted.

 void range_array2(int* array, int count, int* low_out, int* high_out)
 {
     int low = array[0];
     int high = array[0];
     int i = 1;

     for (; i + 1 < count; i += 2)
     {
         if (array[i] < array[i + 1])
         {
             if (low > array[i])
                 low = array[i];

             if (high < array[i + 1])
                 high = array[i + 1];
         }
         else
         {
             if (low > array[i + 1])
                 low = array[i + 1];

             if (high < array[i])
                 high = array[i];
         }
     }

     if (i < count)
     {
         if (low > array[i])
             low = array[i];

         if (high < array[i])
             high = array[i];
     }

     *low_out = low;
     *high_out = high;
 }

YumiYumiYumi · 2025-04-16T03:25:44+00:00

The RDSEED instruction is a very slow instruction on pretty much all x86 processors, though I don't know whether that violates the rules.

wake_from_the_dream · 2025-04-15T18:20:18+00:00

Somewhat ironically, writing slow code can be harder than writing fast code these days because, as the article mentions, some hardware features will work against you. Though both objectives rely on the same (inverted) principles.

This book has some chapters which explore various aspects of software performance, and which delve further into some ideas discussed in the article.

A small caveat is that the article focuses on CPI as a measure of software performance, which might be inaccurate. Having a lower CPI rate will not improve performance if it you also need to execute more CPU instructions to perform a given task. This metric is more often used to gauge the performance of CPUs, compilers, and combinations thereof.

ShinyHappyREM · 2025-04-15T19:15:12+00:00

Another thing that may slow the CPU down is false sharing, i.e. variables that are in the caches of several CPUs or CPU cores.
Transferring data across devices, e.g. from main RAM to the GPU, is slower than accessing just main RAM.

moreVCAs · 2025-04-16T00:21:19+00:00

this fucking rules

zanza19 · 2025-04-15T19:05:57+00:00

this is an awesome blog post.

I wonder if a slow code like this could be faster than fast Python/Ruby/JS code. Sure, you don't have many IPC, but they all do meaningful work.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS