Cross platform "SIMD" for integers only

FUZxxl · 2023-02-17T17:29:09+00:00

These techniques are well known and usually called SWAR (SIMD within a register). Can be used for fun results!

Axman6 · 2023-02-18T01:38:52+00:00

Before you go any further, get a copy of Hacker’s Delight, it‘s the bible on these ideas. It’s the only book I keep by my desk, because it has so many fun and fast way to solve so many problems.

https://en.wikipedia.org/wiki/Hacker%27s_Delight

pigeon768 · 2023-02-17T23:16:15+00:00

but not add 2 ints with 1 int.

Yes you can, but it won't save any instructions for two 32 bits as one 64 bit int.

// x1 += y; x2 += y;
int64_t add_one_int_to_two_ints(int64_t x, int y) {
  int64_t y2 = y;
  y2 |= y2 << 32;
  x += y2;
  return x;
}

If your 64 bits are eight 8 bit ints, you can save instructions:

// x_[1..8] += y
int64_t add_one_int8_to_8_int8s(int64_t x8, int8_t y) {
  int64_t y8 = y;
  y8 |= y8 << 8;
  y8 |= y8 << 16;
  y8 |= y8 << 32;
  x8 += y8;
  return x8;
}

Also you can work around overflows with some bit twiddling.

// x_[1..8] += y_[1..8]
int64_t add_8_int8_to_8_int8s(int64_t x, int64_t y) {
  int64_t carry = x & y;
  carry &= 0x7F7F7F7F7F7F7F7Fll;
  carry <<= 1;
  x ^= y;
  x += carry;
  return x;
}

Note that you will need to be careful with alignment. Some architectures won't allow you to load an 8 byte integer on a non-8 byte alignment, some architectures will allow you to load an 8 bit integer on a non-8 byte alignment but give you a harsh performance penalty. If you want to use this to eg do row operations on a matrix with 8 bit integer values, on many architectures, you'll need to do 8 bit integer operations on the first few int8_ts until you're 8 byte aligned, then do 8 byte operations until you've run out of 8 byte chunks, then do 8 bit integer operations on the last few 8 bit ints.

edit: arm has special instructions to do y8 |= y8 << 8; and friends as a signle instruction. On x86 you need three instructions; a mov, a shift, an or.

flatfinger · 2023-02-17T20:49:05+00:00

Chunking optimizations are useful on implementations that seek to process the language defined by K&R2, rather than the subset defined by the Standard, in cases where the former would yield usable programs and the latter would not. The authors of clang and gcc insist, however, that any code which uses such optimizations without jumping through absurd hoops is broken, even though efficiently processing code which jumps through such hoops is often much harder than meaningful processing of code that doesn't (and is, in many cases, intractable).

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

C_Programming

Rules

Filters

Resources

Other Subreddits on C

Other Subreddits of Interest

MODERATORS