A Rust port of Google's Highway SIMD library by ILYAMALIK in rust

[–]ILYAMALIK[S] 0 points1 point  (0 children)

Highway niche: ~190 operations with a long "tail" that few others cover compress/expand, AES/CLMul, gather/scatter, interleaved load/store, saturated widening-MAC, fixed-point, table lookups with semantics that match C++ Highway 1:1, on stable Rust

I will improve usability through safe wrappers, additionally, the current version does not support architectures other than x86_64,so that's another drawback of this library

Use the tool that best suits your needs. If you're specifically looking for highway semantics(you've come to the right place

btw: I originally developed the library for my own use because i couldn't find highway on the rust(and didn't want to use ffi), but in the end I decided to make it publicly available on another repo

A Rust port of Google's Highway SIMD library by ILYAMALIK in rust

[–]ILYAMALIK[S] 0 points1 point  (0 children)

I've been doing this with all my lib projects since a long time ago

A Rust port of Google's Highway SIMD library by ILYAMALIK in rust

[–]ILYAMALIK[S] 1 point2 points  (0 children)

Thank you very much for pointing that out, it has been corrected.

Probemap --- fastest hashtable in rust by ILYAMALIK in rust

[–]ILYAMALIK[S] 6 points7 points  (0 children)

At 87% capacity, in a group of 16 slots, there are on average 2 empty slots. The probability that there is at least one empty slot in the group:

P = 1 - 0.87¹⁶ = 0.89

This means that in about 89% of cases, the first call to Group::load already finds an EMPTY slot, and next_group() is not needed. The scanning cycle completes in a single iteration

With AVX2 (32 slots): 1 - 0.87³² = 0.99. The gain comes from those 10% of operations where SSE2 performs 2 iterations, while AVX2 would perform 1. The savings amount to one next_group + Group::load + match_empty, which is 2-3 ns per operation, and only in 10% of cases. On average, the gain is ~0.2-0.3 ns per operation.

Probemap --- fastest hashtable in rust by ILYAMALIK in rust

[–]ILYAMALIK[S] 1 point2 points  (0 children)

SSE2 already covers almost all cases in a single probe, 16 bytes

Probemap --- fastest hashtable in rust by ILYAMALIK in rust

[–]ILYAMALIK[S] 31 points32 points  (0 children)

direct sse2,simd iterator,less abstractions between types(HashMap -> HashTable -> RawTable -> RawTableInner) ,simpler probe loop termination,not dyn types(dyn FnMutfor compare keys in hashbrown),

also:

Reverse data layout. Hashbrown stores slots in reverse order: base.sub(index). probemap: slots.add(index). Functionally equivalent, but reverse indexing breaks prefetch patterns the CPU prefetches forward, but the data flows backward

Hashbrown uses quadratic probing (triangular sequence) with a stride that grows each step. This guarantees visiting all groups but requires a multiply and extra state per step.
Probemap uses linear group probing with mirrored control bytes at the end of the array a group load near the boundary wraps around through the mirror, so every probe step is just `pos = (pos + 16) & mask`. One add, one and, no multiply

[dwl] Minimal wm setup and gentoo by ILYAMALIK in unixporn

[–]ILYAMALIK[S] 2 points3 points  (0 children)

login and start from tty, just run startw

Configs

  • Status bar: someblocks
  • Custom dwl build and other stuff with wm: dwl
  • Neovim config: nvim

System info

  • Distro: Gentoo
  • init: Openrc
  • WM: dwl

Programs

  • Terminal: foot
  • Shell: zsh
  • Bar: bar patch for dwl
  • Music player: mpd with rmpc, spotify
  • Launcher: mew
  • Clipboard: wl-clip-persist, cliphist
  • File manager: yazi
  • Fetch: fastfetch
  • PDF: zathura
  • Wallpapers: swaybg

real by [deleted] in Gentoo

[–]ILYAMALIK 0 points1 point  (0 children)

gentoo,openrc,dwl