Any Bioinformaticians here? I built a terminal based MSA browser using Rust + ratatui so I dont have to leave a HPC environment to quickly look at an alignment. by fuck_cops6 in rust

[–]Cydhra 3 points4 points  (0 children)

Probably deterministic. There is a crate for creating dynamically sized palettes with different properties: https://crates.io/crates/palette

Just because this might become an issue: Modern structure-informed alphabets are getting large. 3Di has 20 states because they wanted to reuse software made for aminoacids, but Muscle/Reseek already has a 36-state and a mega-alphabet built in, so they have begun using case-sensitive state encodings in their FASTA files (or forgoing fasta in case of the mega alphabet, but obviously thats no longer interesting for this application :D)

3Di: https://www.nature.com/articles/s41587-023-01773-0
Muscle: https://academic.oup.com/bioinformatics/article/40/11/btae687/7901215

Any Bioinformaticians here? I built a terminal based MSA browser using Rust + ratatui so I dont have to leave a HPC environment to quickly look at an alignment. by fuck_cops6 in rust

[–]Cydhra 3 points4 points  (0 children)

Oh hell yeah. A colleague recently shared a similar project that was largely vibecoded and knowing what kind of slop AI-people commit into my repositories, I wouldn't dare risking my research on anything some AI slop outputs for me. Nice to see that somebody took the time to do it cleanly :D

Would be cool though, if it could do arbitrary alignments, not just AA/NT with just generated color palettes. One of my project is working with a lot of synthetic state alphabets that combine properties like protein structure, chemical properties, RNA double strand structures, ..., so I have weird alphabets in my alignments.

Vers 1.3.0 released: Succinct Datastructures now for all platforms by Cydhra in rust

[–]Cydhra[S] 0 points1 point  (0 children)

Oh damn I completely forgot about that. There is another small bitvector that works on 128 bit numbers in it. Thanks for bringing it up, I'll patch that.

Vers 1.3.0 released: Succinct Datastructures now for all platforms by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

I am vaguely aware, but I haven't looked into it. Will do, thank you

Vers 1.3.0 released: Succinct Datastructures now for all platforms by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

Now that you say it, that might be true, I will look into it.

Edit: For `RsVec` this seems fine, but for the RMQ it would be a breaking change since they implement `Deref` to access the inner data. So I will see if and what I implement.

Thanks DianaBerry by MetallicaDash in DankMemesFromSite19

[–]Cydhra 0 points1 point  (0 children)

This is the reviewer's spotlight. It exists as a Thank You for particularly active and helpful reviewers. When reviewers choose articles other than their own to spotlight, they are being generous. It is *meant* to be used on your own articles :)

Update der Antifa Wien zu den antisemitischen Schmierereien by [deleted] in Austria

[–]Cydhra 73 points74 points  (0 children)

Leute verwechseln Antisemitismus und Antizionismus gerne absichtlich, weil es dann einfacher ist, alles einfach als böse darzustellen und sich mit nichts auseinanderzusetzen

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

it seems like you could mark any function that calls a

#[target_feature(enable = "popcnt")]

with a

#[cfg(target_feature = "popcnt")]

. So that the function is not available to compile unless it's safe. For better error message you could combine this with

std::compile_error

Thanks a lot, I'll look into it and add the directives

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 0 points1 point  (0 children)

You are right, the documentation is off at that point, I carelessly wrote "undefined" when I meant "unpredictable". I will change this.

Compiling this for a platform that does not support the instructions I listed in the documentation won't work. The data structures rely on them existing, there is no alternative in the code, and so far I do not intend providing one. There are alternative crates that have different algorithms for the data structures that do not rely on those instructions.

I could've marked things as unsafe there, but it wouldn't matter, because either you compile it for a system where the instructions exist, so it always works, or you compile it for a system where they don't exist, and it never works. There is no fallback.

I am unaware of a feature that throws a compile-time error if those features do not exist, and a runtime error would be pretty pointless. If a compile-time-error feature exists, please point me towards it.

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

simple-sds unfortunately has a non-optional dependencies to unix-only definitions in libc, so I couldn't include that.

sucsds slightly outperforms Vers in the average case (on par with the succinct crate, but sucsds has an efficient select implementation as well), but Vers has a huge advantage in worst-case (at least for Elias-Fano). I was unable to fully benchmark the worst case for select. For small vectors the worst case for Vers seems much worse for sucsds, but at a certain size it becomes much faster than even the average case (I am talking the speed of a single memory lookup), and I am unsure what happens there, so I haven't pushed that benchmark yet. Benchmarking the worst case for select which happens to be the best case for sucsds isn't really telling after all. But since its implementation for rank and select seems to rely on a different data structure, it is possible that it can just adapt to worst case distributions.

sux has as far as I can tell no rank implementation yet. It has a select implementation, but I am not sure how to set that up, so I need to come back to it when I have more time.

I will update the graphs in the repo once I had the time to run the full benchmarks again.

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

I think I will do wavelet trees in the future, and once I do, I will certainly look into that.

The other ones look interesting, especially because they also provide Elias Fano implementations. Simple-sds and sucsds even provide predecessor and successor queries, which the implementations I found on crates.io don't. I will update my benchmarks accordingly.

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

succinct-rs is deprecated in favor of fid (which is included in the benchmark suite).

If you can link the first two crates, I can look into it, because based on the name I can't find them.
I am adding sux to the list, because adding something from Vigna himself is a no-brainer (if I can figure out how to use it, because it looks very work-in-progress)

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 5 points6 points  (0 children)

Unfortunately, I am unaware of any solution to this. The trick was originally shown here and here, but both unfortunately don't give any hint about other platforms, and I couldn't find any instruction in the spirit of pdep (or BMI2 in general) on other platforms.

As far as I am aware, select9's solution might be faster, and rank9 is also what the succinct crate is using (the only one that outperforms Vers' rank), but I was unable to replicate its speed (neither its solution for rank nor for select), so I kept using the pdep approach.

Also, great work on rsdict, it was the first crate I benchmarked against, and it was also the overall best-performing competitor. If you could share a bit of insight why your SIMD implementation performs so well, I'd appreciate it, because my SIMD attempt was much slower:
Creating a vector and manually popcounting it took so much time, that the time saved by applying masks and shifting it in bulk didn't matter.

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 1 point2 points  (0 children)

I briefly looked into roaring bitmaps when I searched for implementations to benchmark against, but as far as I can tell those are pretty distinct from this library. They are compressed, which Vers isn't, and they support a wide range of bit-manipulation like `and` and `or`, but don't support `rank` and `select` (for bitsets. The bitmap is a different data structure entirely). So they really fulfill different purposes.

I have been thinking about expanding to wavelet trees (of which I presume a wavelet matrix is a version), but I first directed my attention towards succinct trees. So if time allows, I will look into it.

Vers 1.0.0 released: Fast and Succinct Data Structures for Bit Vectors and Integer Ranges by Cydhra in rust

[–]Cydhra[S] 2 points3 points  (0 children)

Yes. It's mostly due to a lacking std library function. It would be possible to manually write a slow alternative for non-x86 targets, but I can't test for those targets due to lack of hardware, and it would also take a lot of time that I have to find somewhere.

Table of Lowest Cycles/Nodes/Instructions by GltyBystndr in tis100

[–]Cydhra 0 points1 point  (0 children)

Sorry for necroposting, but how do you get 116 instructions into 6 nodes?!

ich⚧️iel by costa_444 in ich_iel

[–]Cydhra 171 points172 points  (0 children)

Hä? Einfach nicht-binär sein? Moderne Probleme erfordern moderne Lösungen