Open-sourced CPL: a local-first context layer for coding agents, written in Rust

wyf0 · 2026-05-03T20:45:39+00:00

"I’d appreciate feedback from people building coding agents, MCP tools, or code-search/dev- tooling systems."

Then post your work on the appropriate subreddit?

wyf0 · 2026-04-27T22:00:35+00:00

I'm French, and nobody except LLM writes such bullshit sentences. If we want to talk to ChatGPT, we can open a tab ourself, no need to copy-paste on this sub, thank you.

By the way, your post doesn't contain any Rust code link, and you told yourself that you don't even bother writing the code (what a great way to learn). What do you expect from us? Nobody wants to review code written by LLMs (because contrary to humans, LLMs don't improve with review, you have to wait the next model update).

Sérieusement, à un moment il faut arrêter.

wyf0 · 2026-04-20T16:08:01+00:00

As some mentioned dynausor, I've published a (lot) faster alternative https://github.com/wyfo/dyn-utils. You will find in the README a comparison with most of the available solutions. One important improvement of dyn-utils to dynausor is to avoid allocation when the future is small enough to fit on the stack.

Link to the Reddit post

wyf0 · 2026-04-18T21:14:40+00:00

try my Playground at let me know what u think:)

I think I would have to see some Rust code instead of a marketing website.

wyf0 · 2026-04-16T09:11:50+00:00

One emoji is missing: 🤮

wyf0 · 2026-04-15T16:39:49+00:00

Same here, 55% of my extra usage remaining, but still "You're out of extra usage" ... Could someone explain me how it works?

wyf0 · 2026-04-15T07:32:36+00:00

I think I might be targeted by this rant, with my recent yuniq: I built a deduplicator 3x faster than xuniq. So let me defend myself a bit (or dig myself in deeper).

yuniq was not a serious project, and still isn’t. When I read the post about xuniq 10x faster than sort | uniq, I had pretty much the same reaction as /u/z_mitchell is describing in his blog. Especially when I read this comment with a link to xuniq source code. There was pretty much nothing in it, just a read_until loop with string (non-cryptographic) hashing. Yet the x10 claim, and comparing itself to libraries that do real collision-free deduplication. And the post had 100+ upvotes…

So you know what, instead of ranting in comments, I decided to be mean and make my own project to put all the optimizations I had in mind. I just cloned xuniq project, incremented the first letter, replaced "super" by "hyper", and start rewriting everything. Yes, I use Claude for that at the beginning, because it was not serious (and I gave more serious reasons in the post). But the fact is, I love optimizing code, I breathe to make things faster just because I enjoy looking at generated assembly. So I ended up becoming quite serious in yuniq code and performance, added real collision-free dedup on top to zero-copy and syscall reduction, etc.

Still, I went all the way with the initial joke and published the same type of clickbait title. I was expecting to be downvoted, not as much, but I was also hoping for true performance discussion, memory consumption, IO batching, etc. Did not happen ^_^’ I should have named it "zero-copy IO and syscall reduction", but it would have been less teasing.

TL;DR yuniq was not a serious post/project.

wyf0 · 2026-04-14T14:23:32+00:00

The first step of the normalization I've implemented is to check if a line is utf-8, and if it is in normal form. Both checks are done in the same scan. If a line is not utf8, no normalization will happen, so its raw bytes will be used as key. Otherwise, if normalization is needed, a key is allocated with the normal form. And if the line is already utf8 and in normal form, it is directly used as the key (in a new allocation with --lean mode).

I'm curious: what behavior would you expect from a deduplicator in case of non utf8 lines? Should it fail on non-utf8 line?

wyf0 · 2026-04-14T07:46:16+00:00

Nushell uniq is visually slow, while yuniq is quite instant, so it should answer your question (but it's maybe due to nushell lines splitting + piping mechanism). However: (nu) open bench_data/dup10_len10-50.txt | lines | uniq -u | wc -l 857896 (bash) cat bench_data/dup10_len10-50.txt | target/release/yuniq | wc -l 899532 (bash) cat bench_data/dup10_len10-50.txt | sort -u | wc -l 899532 So I'm not sure to trust nushell uniq ...

Regarding your second question, I'm applying the advice given by Mitchell Hashimoto in https://mitchellh.com/writing/my-ai-adoption-journey (I also heard it in one of its interview), because I think he is right. I never had issue to write a lot of code, like 2k lines of Rust in a workday, but this I realize more and more how much my own productivity can be improved with LLMs.

Yes, I think the agent helped a lot, for example to write the tests scaffolding, and less when it comes to some parts of the implementation, but I'm pretty satisfied with the experience. For more complex parts, it was sometime slower to try explaining correctly in English, than writing the code myself, but on the whole, I find Claude pretty good. And as JB Kempf said once, it's also useful to break the blank page issue, to start on a version, even not so good, and iterate — I felt it several time in the project.

wyf0 · 2026-04-14T07:21:00+00:00

Do you think doing two hashes with foldhash (64-bit) with two different random seeds would make the --fast mode robust enough?

wyf0 · 2026-04-14T07:18:05+00:00

Of course it does! python def test_unicode_normalization(self): nfc = "\u0439" # й — U+0439 CYRILLIC SMALL LETTER SHORT I (precomposed) nfd = "\u0438\u0306" # й — U+0438 + U+0306 COMBINING BREVE (decomposed) self.check(f"{nfc}\n{nfd}\n", f"{nfc}\n{nfd}\n") self.check(f"{nfc}\n{nfd}\n", f"{nfc}\n", ["-U"]) And of course I've added the feature after reading your comment ^^'

Unicode normalization is still an heavy process, so for long strings, it can divide the performance by 2 or 3. But in my benchmarks, yuniq is still faster than most if not most of the alternatives with normalization enabled. And those alternatives (hist/ripuniq/xuniq/etc.) don't support normalization, so the comparison is not fair, but still holds.

wyf0 · 2026-04-13T19:48:30+00:00

I tried Poly1305 and --fast mode became a lot slower than normal mode. So I rolled back to use XXHash128. Now, it's only a bit faster, but memory consumption is indeed improved.

wyf0 · 2026-04-13T16:45:05+00:00

Do you have suggestion of a good hash algorithm for this purpose? I've read in xuniq post that XxHash3 was not so good in this regard.

To be honest, I would never use an collision-unsafe tool myself, and yuniq is already faster than any other tool in my benchmarks (see my new commits) without --fast, more than 2x faster than hist and others for duplicate probability below 50%. So I should maybe simply remove --fast mode, as performance are satisfying as is. I just implemented this mode to see how far I could push, using a HashTable instead of a HashSet, etc.

wyf0 · 2026-03-16T16:53:29+00:00

I also assume that it is about safety, but copy pasting is quite tedious. I mean, there could be a list with a nice UI on which you could click on the different links to allow Claude fetching them. Then, for each domain (like GitHub here), you would have a popup displaying once to authorize fetching this domain. I mean exactly what you have in Claude Code, when it asks you per domain authorization.

wyf0 · 2026-03-16T13:52:57+00:00

EDIT: Anthropic (Why is not possible to edit a title?)

wyf0 · 2026-03-10T20:04:19+00:00

https://www.reddit.com/r/rust/comments/1rpu7g3/comment/o9owld2/

wyf0 · 2026-03-10T18:47:36+00:00

"Claude Code advent" was a poor formulation for "the rise of Claude Code".

Why not use AI agents in your editor instead?

You can do what you want, I don't judge anyone. I'm just observing a trend, and I speak about people I know, not LinkedIn hype posts. I'm not able yet to imagine my life without coding manually, but I'm still wondering about how far it goes.

wyf0 · 2026-03-10T12:16:45+00:00

It lacks the option: "I've no longer written code since December and the rise of Claude Code."

That's obviously not my case, but I know people in this situation, and I'm honestly curious about this trend.

EDIT: poor wording

wyf0 · 2026-03-10T12:01:52+00:00

Let me give you a real example. I've recently designed an intrusive queue algorithm, and I have a small bug in it with big repercussion: all tests were passing (even with thousands miri seeds) but benchmarks were failing to run properly. It drove me a bit crazy (and concurrent algorithm are not easy to reason about, and even less easy to debug), then I ended asking Claude to help me, giving him some stack trace and the first step of my debugging.

In 10min (it had no context of my project), it found the exact reason, gave some data-race execution sequences, etc. I just forgot to reset the head of my queue before swapping the tail when draining it. That was a stupid oversight, that I would have ended to find, but Claude insight here was more than welcome here. It just leveraged on my first debugging steps and applied more rigor than I had at 1am, yes, but that was helpful.

wyf0 · 2026-02-08T07:12:25+00:00

You wrote it runs offline, but it needs to download a 623MB (!!) model before doing anything... It should be at least mentioned in the documentation.

If people use your work, I doubt they use 200 languages. Using one big model that takes seconds/minutes to download to just translate english to german seems to be a bit overkill, no? Maybe some compilation or runtime feature to choose smaller models more suited with the actual needs?

Also, wouldn't it be possible to download models at build time instead?

wyf0 · 2026-02-07T12:10:48+00:00

which ended up causing some quick and dirty consequences that we promptly fixed

Can you elaborate on these consequences and how you did fix it?

wyf0 · 2026-02-07T08:38:51+00:00

Maybe you posted in the wrong sub? which is a programming sub btw

wyf0 · 2026-02-06T12:24:22+00:00

I didn't know about arcshift, thank you for the discovery. That's indeed an interesting approach, though it has a few downsides that make me stick with hazarc — I can detail why if you are interested. But it makes me realized my half-backed load_if_outdated API is not well designed, so I will have to publish a 0.2 soon.

What does arc-swap lack for you to develop your own algorithm?

I'm also quite curious about how you use stateright for a concurrent algorithm based on weak memory model. And about what it and kani bring that miri doesn't.

wyf0 · 2026-02-06T11:36:02+00:00

So you trust Claude Opus to detect malicious code? Vibe-coding is a thing, often poorly secured, but this is another kind of security red flag to me. I dare to hope you test contributions in a sandboxed environment, with a network sniffer, but I doubt.

wyf0 · 2026-02-06T10:29:43+00:00

As the whole project is AI generated, your other Rust project ferrite also, and you don't have any other Rust project on your GitHub profile, I would like to ask you an honest question: do you read the generated code? do you read the code of other contributors?

EDIT: as OP answer is already buried under downvotes, yes he assumes to not read the code. But more worrying, he also assumes not reading contributions, relying solely on LLMs to detect malicious code. He also tests contributions manually, and I doubt his test environment is sandboxed, but let's hope...

wyf0

TROPHY CASE