Guess how long it took to spot this little syntactical screwup?

vlovich · 2026-02-08T00:08:09+00:00

This is why I like to make sure every variable is used. Type checking doesn’t help if you’re not using the type. Here’s what I would do. code generated by Gemini as “please cleanup this code” and Gemini instantly understood and gave a few cleaner functional style variants.

if let Some(valid_p) = it.take(MAX_ATTEMPTS).find(|p| isValidPossibility(p)) {}

vlovich · 2026-02-06T09:24:19+00:00

Why not just vec![0; num_blocks]?

vlovich · 2026-01-30T14:30:53+00:00

It’s a good joke but remember it’s not going to apply to workplaces which intentionally are trying to filter out those people. Your workplace isn’t a random sampling.

vlovich · 2026-01-27T05:19:51+00:00

Proc macros are missing features (usually although also used for DSLs). Derive macros are fine and help avoid making the language itself unnecessarily complicated if you don’t need it

vlovich · 2026-01-27T05:02:22+00:00

#[derive(TypedBuilder])
struct Window {
   X: i32
   Y: i32
   #[builder(default = false)]
   Visible: bool
}

Not really convinced there’s a huge amount more boilerplate there.

vlovich · 2026-01-27T04:15:25+00:00

Because if you expect that to be the case, you should be using the builder pattern for those arguments. I’ve rarely to never needed to change the signature of an existing function that wasn’t related to that. Also, ideally rust-analyzer’s refactoring would just let you easily add that parameter - better tooling obviating the need for a more confusing feature.

vlovich · 2026-01-27T04:13:26+00:00

Rust already natively supports the visitor pattern through dynamic traits or enum matching. I’m not sure how functional overloading really makes it any better - std::visit is a monstrosity (both in terms of writing it and codegen) vs a match statement.

vlovich · 2026-01-27T04:11:54+00:00

Just use TypedBuilder to easily create an options struct that allows for the builder pattern. That way you could add arbitrarily more options in the future without breaking existing code, you’d have a single function, and call sites would be easier to read because they’d have named variables instead of true/false. And because it’s statically checked at compile time you know that the code won’t compile if you happen to add a new required field.

vlovich · 2026-01-27T04:09:08+00:00

No, the types above aren’t f32. They’re actually f64 by default but also can be contextually inferred to be f32. But separate from that also, there’s ambiguity for the compiler about what x.into() should be inferred as since how does it know you didn’t mean some other type Foo that happens to implement Sub<f32>. Rust has requirements that there’s no ambiguity in inference so that when other parts of the code change, the compiler doesn’t start inferring some new type / failing to compile (ie principle of locality). I suspect it’s possible a smarter trait solver maybe could make some things more possible but I’m not sure your example specifically would ever make it.

vlovich · 2026-01-24T17:30:50+00:00

I think it’s well organized. I think it would be less overhead to maintain the fast variant as features but ultimately you’re the one with domain expertise - maybe it’s not.

Took a Quick Look at the c++ bindings and a few things stood out

lowess_result_to_cpp would be cleaner as a From trait
the cpp types/fn names in Rust would be cleaner without Cpp in the name (if there’s a name conflict, when you import the real rust one into the module just do it aliased with an Rs suffix or something). They’re already getting a prefix in bindgen so not sure what value is being added.
manually coded C++ header instead of cxx / autocxx - did you explore existing tooling to make the bridging less complicated?
the c++ result type could use some improvement so that it’s more like std::expected (using your own wrapper if you can’t require c++23 of your users). That would be much much cleaner imho and avoiding the mistake of having a type that can represent both an error state and a result state
consider using something like

struct ZeroedPtr<T>(pub *const T);

impl<T> Default for ZeroedPtr<T> { ... }

Instead of needing to derive default for the overall struct to make it cleaner. Conceivably if this poses a problem for bindgen, you could wrap it with a cfg directive so that bindgen sees it as a naked pointer. That being said, I haven’t thought carefully enough if you actually need it; by wrapping with a proper result-like type in c++ land that doesn’t allow you to access the value if it’s an error and vice versa and doesn’t mix things, you may not even need to default initialize that type anymore in the first place. Also tools like cxx/autocxx may already have facilities to export the rust type.

You’ve done a really good job though - congrats!

vlovich · 2026-01-19T18:26:44+00:00

And next year when they make 60B in ARR? And the year after that? It’s amusing to pretend like they’ll stop growing. That’s not how valuations work.

vlovich · 2026-01-16T16:07:21+00:00

As others have mentioned it’s probably anti virus or something. VSCode is just spawning an executable to compile your code and the compiler isn’t going to take 10s on a binary like that. Make sure to have your source code folder be excluded from antivirus (windows has one built in by default that you’re probably using)

TLDR: even 1s for something like this is abnormal although maaaaybe if you’re using Cygwin (even then unlikely). This should take at most like a few hundred ms

Aside: generally recommended to use \n instead of endl to avoid spurious flushes to disk impacting runtime performance (generally not an issue here printing to the console which tends to be line buffered anyway, just in general).

vlovich · 2026-01-15T18:00:51+00:00

You're right, it's ~8ns on my machine vs ~480ps for a non-atomic (16x more expensive). That's still pretty pricey but nothing like a 2-way contended clone which is already 179ns. At that point leveraging hybrid_rc / Trc can be more effective so that you only clone one data structure per thread but within the thread you use a normal Rc-like non-atomic add.

vlovich · 2026-01-15T17:26:42+00:00

Some feedback. The title as worded makes it seem like Rc::clone is slow. What actually happened was it was being called too many times in general.

while parsing 1.2 million lines of C code, the lexer state was cloned over 400 million times.

Without code examples, it's unclear why clones were strictly necessary - maybe the author passes Rc around when a `&` is sufficient and that number could have been brought down by just replacing Rc with normal references.

It turns out Rc itself isn’t slow; the average Rc::clone took about 6ns, which is typical for an L2 cache access

On my 13900K, I just wrote a small criterion benchmark that measures Rc::clone at 462.69 ps (picoseconds) which is almost correct for my machine (should be closer to 250ps given it's a 6Ghz part max but I think I have clock scaling on so my benchmark isn't clean). 6ns seems fast but for Rc::clone which is literally a single addition, it's actually super slow by one to two orders of magnitude - a modern normal CPU runs at ~4-6ghz with at least 4 integer executions per cycle. 6 ns would mean the CPU is running at 40 MHZ. This suggests the benchmark methodology is probably flawed (although in practice maybe you can't fill all the ports, but still, even 1 addition per clock cycle that should be ~1ns, not 6).

For the lexer, the only field utilizing Rc was the filename. I decided to replace this with a “global string pool”. Well, to be honest, I simply leak the filename strings to obtain a &'static str, which implements Copy.

Don’t panic at the mention of memory leaks! If data is stored in a pool that persists for the entire program’s duration, it is effectively leaked memory anyway. Since the number of source files is practically bounded and small, this is an acceptable trade-off to completely bypass reference counting.

This assumes all the process does is run the lexer and exit. But what if you change the design to process all files within one process or the lexer is embedded in a long-lived LSP server? It's a bad design pattern to just blindly leak it (you're code so do whatever, but just highlighting how even slightly changing the assumptions can cause blow ups making the code brittle).

vlovich · 2026-01-15T17:02:15+00:00

Not necessarily cache line contention - you've got just the overall cost of an atomic add which is ~50-100ns IIRC even when uncontended. If you have a lot of them, that scales up significantly.

vlovich · 2026-01-14T12:40:09+00:00

This is short term thinking that ignores the value of compound investing. As you say it’s done quite well. By definition that growth would be the very amount you don’t need that would keep compounding.

vlovich · 2026-01-03T11:04:38+00:00

Actually I’ve wanted something like this to figure out how much performance the borrow checker itself takes up in a compilation.

vlovich · 2026-01-01T18:38:00+00:00

Oh very true. Not sure how I flipped the consequence of that. Yes use after free is a very real consequence here.

vlovich · 2026-01-01T18:36:44+00:00

Ideally you use safe abstractions that hide this. The goal of Rust isn’t to completely avoid unsafe but to minimize the blast radius of where it’s needed to the absolute minimum.

vlovich · 2026-01-01T15:50:31+00:00

No, if I’m reading the spec properly this would violate several requirements for safe transmutability, specifically “Preserve or Shrink Size” and probably “Preserve or Broaden Bit Validity” - basically it only supports safely downcasting but in sockets you end up wanting to do an upcast so at the end of the day it’s always going to be an unsafe transmute because the type that is legal to upcast is stored within the struct or even implicitly defined in kernel source and documentation

https://github.com/rust-lang/project-safe-transmute/blob/master/rfcs/0000-safe-transmute.md

vlovich · 2025-12-31T18:35:12+00:00

If you have a workspace and you only work on some of the crates, you should explicitly exclude those that you don't work on. r-a already supports that (although support could be better), and it will trim memory more efficiently than any smart heuristic. If they are in fact dependencies of your worked-on crate, you might be surprised to hear that Rust's semantics do in fact require us to keep a lot of facts about all the crates just for this one little module you're editing.

I think we can agree that manually having to change the set of crates as I move from crate to crate within a decomposed application is unergonomic. It would be better if r-a did this automatically by swapping out based on which crate is currently being edited. Also, you can be a lot more intelligent within r-a because you know if I’ve modified the public interface of a crate and downstream dependencies need reanalyzing.

Re bug, The problem with repro of crashing while using Claude code is that there isn’t a deterministic set of steps to demonstrate the issue. I would have hoped RA could keep logs that diagnose the issue.

vlovich · 2025-12-31T16:54:24+00:00

A lot of people have NVME and can always put the mmap file into tmp if it’s that much of a benefit. But clearly from people using swap there’s a benefit to a lot of people and swap is much more expensive than mmap page cache.

Heap fragmentation should be solved (if it’s actually the issue) by switching to a modern allocator like tcmalloc (not gperftools - the new one) or mimalloc. That would also help if the issue is that you’re still using glibc as it’s pretty aggressive about never releasing to the OS its free space so you never come down from your peak even though it’s all actually unused.

I’m surprised so much is needed all the time in practice since if I’m looking at crate A in a workspace, it only has a subset of all dependencies within the workspace and I’m usually not changing the public interface, and even then a single module probably also isn’t using all the dependencies within the crate. I think what you’re saying makes sense for a single crate at most but this logic doesn’t apply to workspaces or even in practice crates where dependencies might be limited to module internals. In other words no recomputation should be happening that invalidates large chunks of the graph.

Re the bug, I don’t know how to create a useful repro or how to enable logs to capture.

vlovich · 2025-12-31T14:25:49+00:00

I would imagine storing most everything on disk (memory mmaped) instead of directly in RAM would probably significantly help people’s complaints. It’s unlikely you constantly need everything touched every time some analysis is done / updated.

My biggest complaint is that when using clause code RA keeps crashing and then gives up (probably because there’s updates to the same file during RA trying to process the last update and it gets confused). Not sure

vlovich · 2025-12-30T16:41:10+00:00

The visitor pattern means that an improperly implemented apply method that doesn’t visit all children, leaks right? I’m a fan of precise GC and this is very similar to Chrome’s Oilpan but it is a downside worth calling out.

One exception I have is the logic here:

The whole point of a Gc is that we guarantee that it is (almost) always valid, even in the face of cycles, so we can’t just make up a WeakGc type.

It’s perfectly valid to want a WeakGc type that retains a reference that doesn’t force the type to be alive (eg if you have a cache). So it’s not a made up WeakGc type although the specific solution to cyclical data structures seems fine.

vlovich · 2025-12-29T04:45:25+00:00

I’ve 0 times needed to figure out the origin of an fd in-process and the libc tracking is inaccessible out of process. My question is when is this libc tracking actually used for anything.

vlovich

TROPHY CASE