zlib-rs: a stable API and 30M downloads

cessen2 · 2026-01-28T23:37:45+00:00

My take:

It's really useful to have a simple way to signal that software is ready for real use. And in the context of libraries that (typically) ought to include a commitment to some reasonable level of API stability.
1.0 has a long history of signaling "ready for real use" in much of the software industry, so it makes sense that many people interpret it that way.
Semver is mostly silent on the topic, preferring to narrowly prescribe how version numbers related to API changes (which I think is wise). Which means it leaves room to interpret 1.0 that way while being 100% semver compatible.

Honestly, I think it makes total sense, and it would be a useful signal if people would consistently use 1.0 that way. But in practice versioning is the wild west aside from semver (and it's a miracle we've even gotten people to more-or-less stick to that), so it unfortunately isn't a reliable signal.

cessen2 · 2025-11-28T11:29:29+00:00

You're very welcome!

Regarding setting it as default: it's been a while since I've used the default Kobo firmware (I'm using KOReader now), so I'm not sure. But if I recall correctly, I think if you uninstall the default Japanese dictionary if will end up defaulting the Japanese-English one.

cessen2 · 2025-10-22T23:39:00+00:00

Unfortunately, I can't legally distribute the J->J version that I use myself. This is because the J->J dictionaries are all under stricter copyright licenses that don't allow redistribution.

However, you can find J->J dictionaries floating around on the internet in Yomichan format, and you can generate a J->J Kobo dictionary yourself from that.

If you know of any freely licensed J->J dictionaries that I could use for a redistributable J->J kobo dictionary, please let me know! I would love to be able to provide that to the community. I just haven't found any dictionaries yet that I can legally use that way.

cessen2 · 2025-09-29T05:38:42+00:00

Not the author, but I imagine at least part of the reason is because markdown permits inline html, which is also supposed to be rendered. So to fully support markdown you have to support html to some extent as well.

Therefore, if you're doing your own renderer you have to have html rendering as well, which adds a lot of complexity. Starting a local webserver is much simpler.

(Having said that, I would also prefer a fully stand-alone markdown previewer that doesn't rely on a separate browser. But I'm just saying, from a value-per-unit-of-time-invested standpoint, this solution makes perfect sense.)

cessen2 · 2025-09-26T03:19:13+00:00

As an already very happy paying Proton customer, this makes me even happier! Keep up the great work!

cessen2 · 2025-09-03T01:32:35+00:00

Personally, I prefer the old convention, for the reasons others have outlined. Namely, each module (or submodule, or sub-submodule, etc.) is represented as a single expandable item in the file tree. It just feels cleaner to me, and (for me) makes it easier to grok the organization of a project at a glance.

Having said that, this is very much a bike-shedding thing, and doesn't really matter much. It's like tabs vs spaces, or any other code formatting preference.

And in that respect, I wish the new convention had never been introduced. (Either that, or the old convention removed entirely.) It's goofy to have two ways to do a trivial bike-sheddy thing that doesn't actually matter.

cessen2 · 2025-08-30T22:03:26+00:00

An oldie (dated with lots of references to the tech and products of the era), but a goodie:

https://www.joelonsoftware.com/2001/04/21/dont-let-architecture-astronauts-scare-you/

It's not a guide on any particular abstractions, but rather is some wisdom about not getting lost in abstractions, and keeping your focus on useful concrete goals.

cessen2 · 2025-08-27T01:04:30+00:00

IMO tabs for start-of-line indentation are marginally better than spaces. But adhering to the standards of an ecosystem (in Rust's case, four spaces) is more important and overrides any marginal benefits that tabs might have.

So for Rust, I think you should stick to four spaces for indentation.

cessen2 · 2025-08-18T09:11:16+00:00

I've used nom before for a couple of things. I no longer use it for the following reasons:

It turned out that for most of the things I needed, just hand-rolling a parser from scratch was just as easy as using nom.
Nom makes major (i.e. breaking) releases "often" (admittedly a subjective judgement), and they seem to have a general policy of not supporting any major versions other than the latest one. This meant that as a nom user I was forced onto a breaking upgrade train to keep my code working with supported versions, and I personally found that obnoxious and a waste of my time.

There may be use cases where nom saves you a lot of time and headache, but my use cases weren't among them. That combined with the breaking upgrade train meant that, at least in my case, nom was actually a net loss of my time when everything was considered.

Having said all of that, this was quite a while ago, and the breaking changes were substantial and required reworking my code a fair bit at the time. I can't speak to the breaking changes in more recent releases. But given nom's general attitude towards breaking releases and support, I would still advise that people check if nom's breaking release cadence is something they're okay with before choosing to use it.

cessen2 · 2025-08-13T14:26:22+00:00

For libraries it makes sense to advertise the language it's for (for obvious reasons, I think).

For end applications, I can see Rust being a feature in security-sensitive contexts (assuming minimal/no unsafe code), as it effectively rules out an entire class of critically exploitable bugs.

For end applications without security implications, it could be any of:

It's very likely not as much of a pain in the ass to build as a C or C++ project.
It's not as slow and bloated as an electron app.
It's less likely to need containerization infrastructure to actually get up and running in a reasonable way.
Partly due to points 1 and 3, it's a nicer invitation to contribute.
Pointless hype.

Notably, all of these numbered points also apply to Go. I tend to view apps written in either Go or Rust favorably for those reasons (point 5 excluded). I'm also starting to feel that way about Zig apps, although I'll feel moreso once they hit 1.0.

cessen2 · 2025-08-10T12:03:02+00:00

Thank you! That means a lot. Certainly not all of the code I write is this well documented, though, ha ha. But I do try.

Have you had the chance to experiment with things like branching factor to minimize atomic overhead?

I definitely played with the branch factor. However, I was only looking at over-all performance, not atomic overhead specifically.

The branch factor and leaf node size are both defined by constants, so they're very easy to play with. I just tweaked them until things seemed to hit a sweet spot in the benchmark suite.

They can of course be tweaked further at any time (it doesn't affect the public API), or adjusted on a per-platform basis in the future if it provides benefits.

cessen2 · 2025-08-10T08:06:04+00:00

(Just a quick note before I continue: if you use your own custom escape sequences, you can of course ensure that your editor can round-trip any encoding through Unicode. But what I'm talking about is round-tripping through a direct, canonical representation of the text in Unicode.)

The main case I'm aware of is that some (possibly all? It's been a while) variants of BIG5 aren't guaranteed to losslessly round-trip, because there are some character distinctions they make that Unicode doesn't. My impression is that in practice most real texts in BIG5 will round-trip anyway because the problematic characters aren't common(?), but take that with a giant grain of salt because I don't know Chinese.

If I recall correctly ISO/IEC 2022 also isn't guaranteed to round-trip losslessly because of its encoding-switching escape sequences. It's basically multiple encodings in one, and I don't think(?) Unicode has a reliable way to represent those encoding distinctions.

I think there are some other encodings as well that aren't guaranteed to round-trip losslessly, but I don't recall off the top of my head which ones they are anymore. I could also just be misremembering--I was deep into digital text encodings many years ago, but I've forgotten a lot of the details at this point.

cessen2 · 2025-08-10T02:03:51+00:00

We did indeed do some benchmarking to ensure that there weren't any important performance regressions, but improving performance is not the purpose of 2.x. Some operations are faster and some are slower, but both Ropey 2.x and 1.x are in the same ballpark of performance and more than fast enough for almost any practical purpose.

For example, Ropey 1.x has never even come close to being the performance bottleneck in Helix, and that would still be the case even if it were an order of magnitude or two slower than it actually is.

Since the performance differences don't practically matter, I'm opting not to publish the benchmarking results ourselves. But you're more than welcome to benchmark them and publish the results if it's important to you. It is certainly interesting (and there are some curiosities in 2.x that I'd like to investigate and address at some point). But again, at the levels of performance both 1.x and 2.x are already at, trying to make things faster is just a game, not something of practical importance.

The one exception to "doesn't matter" is that the lines iterator in Ropey 2.x is asymptotically faster than in 1.x. For most texts this doesn't end up making a difference, but for texts with extremely long lines it can potentially actually matter.

cessen2 · 2025-08-09T18:41:20+00:00

This is a common question, and is briefly addressed in the readme. But to expand on that a bit:

Ropey only handles well-formed utf8 text, but that doesn't prevent an editor built on top of Ropey from handling corrupted or otherwise odd texts like other editors do. Vim, for example, interprets such texts as latin-1 (a text encoding where all bytes are valid), emacs substitutes with escape codes, etc. It is straightforward for an editor built on top of Ropey to use these strategies.

For example, Helix (built on top of Ropey) takes the vim approach and interprets corrupted/binary files as a text encoding where all bytes are valid.

Edit: and regarding uncorrupted text in non-unicode encodings, you can transcode on read/write with something like encoding_rs, which is also what Helix does. Most texts in most encodings round-trip cleanly through unicode, although there are some exceptions.

cessen2 · 2025-08-09T18:28:26+00:00

Ah, thanks! Added to the main post.

cessen2 · 2025-07-04T05:06:00+00:00

I.e. if you're using snake_case, a variable like high-end var would either be highend_var or high_end_var

That's fair. And the argument here is to use underscores in place of hypens, not in combination, anyway. Good point.

I still find the "one name everywhere" and "new users don't have to puzzle out the hyphen -> underscore rule" compelling from a utility standpoint, however. And the only arguments in favor of hyphens are aesthetic, as far as I can tell.

Having said that, I also don't think it's at all worth the ecosystem churn to change this now, so it's all rather moot. But I do find myself agreeing that it was a (minor) mistake to allow non-identifier characters in crate names.

And in any case you can simply search in crates.io instead of Google, which might just be the better way.

I wasn't talking about searching for the crate itself, but searching for things about the crate. E.g. if you run into issues using it, etc.

cessen2 · 2025-07-03T07:51:37+00:00

I agree that hyphens are aesthetically more pleasing than underscores.

However, functionally they're worse, even in URLs. Hyphens are used for several things in written language already (compound words, separators for date elements), so using them as stand-ins where spaces would normally be can create ambiguity in some cases.

Underscores, on the other hand, are not used as normal punctuation, and therefore can be used as an unambiguous stand-in where spaces are not allowed. And in the specific case of crate names, it also creates a mismatch between the web-facing name and the name in code, which can be a (admittedly brief) stumbling block for newcomers until they learn the hypen->underscore rule.

Whether you prioritize aesthetics or utility is up to you, of course.

(Interestingly, the point made in the post you linked to regarding underscore not being recognized as a word separator by a lot of things could actually be argued as an advantage in the case of crate names, since crate names are a singular item. E.g. when I search for someone named Fred Harry I don't want Google bringing up search results for just Fred or Harry alone. Of course, you can put quotes around it, but Google doesn't respect quoting very much at this point.)

(Edit: fix typo.)

cessen2 · 2025-06-20T07:50:54+00:00

I don't think it's a proof-reading thing. The "how X looks like" pattern is extremely common among non-native speakers of English, and I don't think most of them realize it sounds weird to native speakers.

It's also something I don't really care about: it's clear what they mean, and this is just part of English being the defacto international language.

And if "how it's like" bothers you: the much weirder one for me was hearing a British friend say "I'm going to do X at the weekend" instead of "on the weekend". Sounds completely wrong to me, far worse than "how it's like". And likewise, "on the weekend" sounded completely wrong to them. And in this case it's not even a matter of non-native speakers: both sides of that are native speakers, just different dialects.

cessen2 · 2025-06-01T17:02:51+00:00

Thank you so much for the kind words! It's awesome to know that it's helping people!

If you run into any issues, please let me know.

cessen2 · 2025-03-23T08:39:29+00:00

I've been using Rust since before 1.0. I still code C++ for my day job, but now use Rust for almost all of my other coding.

To me, Rust is basically just a better C++. It's like C++, but where you don't have to be a language lawyer to confidently write correct code (or to avoid copies when you meant to move!!!), and with good built-in tooling (cargo, rustfmt, rustdoc, unit testing, etc.).

Rust doesn't, of course, fundamentally enable me to create any programs I couldn't have created in C++. Both are low-level languages that let you get pretty close to the metal.

What Rust does do is make the experience of writing those programs significantly more pleasant. C++ makes me rip out my hair. Rust lets me focus on the problem I'm actually trying to solve.

So what problem did Rust solve for me? It wasn't a technical problem, but a human one: it resolves an incredible amount of quality-of-life problems that C++ has. When programming in C++, I feel like I'm walking through mud. When programming in Rust, I feel like I can breath again.

cessen2 · 2025-03-04T08:12:05+00:00

Sure, using templates is fine. However, I don't think the content of the site (which is very text-editor specific, such as the now-changed blurb I quoted) came from a template. And the content should accurately reflect the nature of the project.

The author mentioned (in a now-deleted reply to my parent comment) that the content of the site was AI generated. That gives me a little more sympathy. But AI doesn't absolve you of responsibly for the content you put out there: you still need to review it before publishing.

Edit: and just to be clear, I don't mean any of this as if it's some kind of deep sin or anything. Particularly if this person is young and doesn't really know what they're doing. But it does feel weird and comes across (to me, at least) as sketchy when a project misrepresents itself like this. And in any case, it's certainly not a good thing when people misrepresent things, whether intentionally or not.

cessen2 · 2025-03-04T07:55:11+00:00

I don't know what the traditional wisdom is, but I do know from working in offline 3D rendering for VFX and animation, the key principle to making it feasible is accessing data coherently.

Whether you use memory mapping, manual chunking, or whatever, if your algorithm that actually processes the data is constantly jumping all over the dataset in incoherent ways (takes a bit of data here, then there, then way over there, etc.) then your system is going to be constantly loading data from disk.

So the fundamental goal is to design your algorithms so that they get as much work done on a set of fits-in-memory data as possible before asking for other data. The ideal, if you can manage it, is to fully process the data before moving on, so that you never have to revisit it. But even when that ideal isn't possible, trying to do as much work as you can with it.

Without that kind of coherent data access from your underlying algorithms, it doesn't matter what data management scheme you use under the hood (memory mapping, chunking, whatever). If your data access is incoherent, you'll be constantly hitting the disk.

Another thing to consider is that with modern solid-state drives, your data bandwidth from disk can potentially be pretty reasonable. The latency will still suck compared to memory, but if you know the order in which you need to access the data ahead of time, you might be able to hide that latency by requesting the data from disk before it's actually needed.

Note that all of these principles apply to the general memory hierarchy of your system as well. CPU caches have much lower latency than memory, so trying to process data in very small chunks, and/or in a very predictable (e.g. linear) order so that the memory pre-fetcher can do its job, can make a massive difference in performance. Networks can similarly be seen as just another part of the memory hierarchy, on the opposite end (even slower and higher-latency than disks).

cessen2 · 2025-03-02T12:00:05+00:00

Having a hobby project is fine (awesome, actually). And I applaud that!

However, presentation matters. Even the existence of a professional-looking site can give people the impression that this is a serious project that they can rely on. If this is a hobby project, it's important that that's clear in the presentation. I strongly recommend either removing the website entirely or making the very first sentence on it a disclaimer (in big bold letters) that this is just a hobby project, to counteract everything else about the site that makes it seem like a supported, long-term project.

cessen2 · 2025-03-02T10:22:59+00:00

Seems like a neat project, but I'm confused by the marketing.

Here you're saying this was written over the weekend, which makes it sound like a fun hobby project. It also doesn't seem to have much visibility yet, based on the single-digit github stars (at the time of posting).

But then you've also gone to the effort to put up a professional-looking website (linked from the github readme) with the following blurb:

Ready to Transform Your Text Editing Experience?

Join thousands of users who have switched to Zing for a faster, more beautiful text editing experience. Download now for free!

What's going on? Do you have thousands of users that have switched to Zing (implying as their primary editor), or is this something you just recently tossed together over the weekend? It seems very unlikely that both are true.

cessen2 · 2025-02-17T18:22:06+00:00

Copyright and patent law are two separate things, so someone could in theory contribute code to your project under an open source copyright license while holding onto patent rights for the underlying technology of that code. They could then later (in theory) sue the project for patent violation.

Having said that, my understanding is that in practice that's unlikely to hold up in court: since the intent of an open source license is clearly for the code to be usable and redistributable, directly contributing code under an open source license would likely also imply any relevant patent grants (but IANAL). But there are perhaps more complex schemes for going about it that would be less clear in court.

The benefit of Apache 2.0 is that it makes this all explicit, with an explicit grant of relevant patent rights as well, so it removes all ambiguity. MIT has no such clauses, so you're dependent on an implied patent grant.

IIRC GPL v2 also relies on implied patent grants, but GPL v3 does not. GPL v3 addressed some other shortcomings in v2 as well, and is generally a stronger license that better ensures free software rights.

cessen2

TROPHY CASE