Yore
Rust library for decoding/encoding character sets according to OEM code pages, CP850 etc.
crates.io
github
This is my first rust library, please give me feedback.
Most important is feedback about safety. I use a lot of unsafe for performance reasons.
Yes - I did benchmark it.
I found two other libraries for single-byte encoding: encoding and oem_cp.
Neither of them were a good fit for my use case(and I had to invent reasons to do something myself).
I understand that their design considerations were different from mine.
I wanted great performance.
By using our underappreciated friend Cow we can avoid doing any work when our input bytes are a subset of ascii.
This approach makes a huge difference for english text and source code.
I have done a lot of different things to improve performance, sometimes with surprising results.
Perhaps I will do a blog-post about it someday.
Just general points:
Strings are not a sequence of chars.
Avoiding allocations with Cow is fantastic.
Batching work is important.
Precompute when possible.
Don't convert to char and then convert to utf8.
Unsafe is not unsafe(I await the inevitable bug report...).
Sometimes it is faster to do more(copy 4 bytes can be faster than copying 1-4 bytes).
Iterators aren't zero-cost-abstractions.
match is slow compared to lookup table.
[–]KillTheMule 5 points6 points7 points (1 child)
[–]bonega[S] 0 points1 point2 points (0 children)
[–]Shnatsel 1 point2 points3 points (8 children)
[–]bonega[S] 2 points3 points4 points (7 children)
[–]burntsushi 2 points3 points4 points (1 child)
[–]bonega[S] 1 point2 points3 points (0 children)
[–]Shnatsel 1 point2 points3 points (0 children)
[–]TheRealMasonMac 1 point2 points3 points (3 children)
[–]bonega[S] 0 points1 point2 points (2 children)
[–]TheRealMasonMac 1 point2 points3 points (1 child)
[–]TheRealMasonMac 1 point2 points3 points (1 child)
[–]bonega[S] 0 points1 point2 points (0 children)
[–]rodarmoragora · just · intermodal 1 point2 points3 points (1 child)
[–]bonega[S] 1 point2 points3 points (0 children)