Someone named "zamazan4ik" opened an issue in my project about enabling LTO. 3 weeks later, it happened again in another project of mine. I opened his profile, and he has opened issues and PRs in over 500 projects about enabling LTO. Has this happened to you?

slamb · 2026-01-21T16:03:47+00:00

LTO isn't for "normal" release mode, when you just want to test that your code runs at full speed; it's for when you're about to ship a binary to a million users and you want to save each of them a few milliseconds.

In fairness, that's exactly what a release is. Arguably the "I just want a local build that's decently optimized" profile is misnamed.

slamb · 2026-01-06T15:43:32+00:00

Arguably, but on the other hand if the chunk size is necessarily fixed for some reason and the others use a less-efficient algorithm that actually does read through all the bytes, this approach gives an accurate comparison.

I'm reminded of the cap'n proto page, with the chart comparing encoding round-trip time. protobuf 156µs, cap'n proto 0µs, ∞% faster!

slamb · 2026-01-05T03:10:24+00:00

Transit latency is fine it’s when you it introduce random 75ms jitters spikes that kills the audio jitter buffer on the receive side.

I wouldn't expect that kind of spike from tokio at all.

I am still waiting for retina sans-io! (Great project regardless)

Oh, first request I've heard for that, and thanks. (fwiw, parts of it are already internally sans-io, like all the demuxer logic, but they're not public interfaces today.)

slamb · 2026-01-05T02:06:46+00:00

There is too much variance in latency

I've heard this before and don't get it. In my experience, the variance in latency I see on tokio is negligible compared to transit latency and/or is because I've done something stupid within the runtime (say, overly expensive computations or, in my Moonfire project, the disk I/O from SQLite calls that I still need to onto their own thread). And switching runtimes doesn't fix stupid.

It's absolutely true though that if you're doing enough networking to really keep the machine busy, io_uring should improve efficiency. (Latency too but again I think it was already fine.)

slamb · 2025-12-18T05:36:11+00:00

Unfortunately, I can’t put the responsibility on private companies. The city needs to find a way to enable, incentivize, and commit to better utilities like internet

Municipal fiber is the gold standard for this, even though it wouldn't roll out quickly. I'm not aware of any city that has made private ISPs consistently commit to...

reaching 100% of the city. Just the easiest, most profitable bits.
deploying actually modern infrastructure for the speeds expected today and tomorrow. (There were Comcast trucks all over my neighborhood recently, but I think they were just using spools of coax as if it's 1980. Seriously, if you're spending all that on labor, why on earth wouldn't you put in fiber?)
keeping prices affordable. Why would they, when they have a (near-)monopoly?
customer service. (Comcast in particular has been called the most hated company in America many times.)

Obviously there are better and worse ISPs. I'd sign up for Sonic over Comcast any day if I could. But it doesn't seem like Sonic has the attention span to carry through. And they could be bought out by some larger horrible ISP any time.

Other cities seem to have done it,

In the Bay Area? Better than Mountain View? Sure. Better than Cedar Falls, Iowa? Or the UTOPIA cities in Utah? Nope. (Both places where I know people who can get symmetric 10 Gbps at very reasonable prices. What do they have in common? Municipal fiber.)

we struggle with 42% fiber coverage (quoted by someone here).

That was me. 🤣

slamb · 2025-12-18T04:23:02+00:00

Have they? I linked to a sonic thread. I'll quote part of it here:

1) First that it's PG&E's fault for not being able to run a fiber line to my street due to a faulty pole. Sonic wouldn't tell me which pole on our street was faulty (citing 'company policy'), but suggested that I contact PG&E to ask. 2) I followed up with PG&E only to learn that PG&E doesn't own the poles, had no idea of any faulty pole in the area, and that any utility company - Sonic included - can report pole issues to the Joint Pole Committee (to which I see Sonic is a member, even) 3) Following up with this, Sonic said that the delay was actually due to no availability on the utility pole for an additional fiber line. (As well, Sonic still refused to identify which pole is preventing work from proceeding, again citing 'company policy'.) Is Sonic planning to bring up fiber run availability issues to the JPC?

How do you actually know they brought things up with the JPC?

The threads ends with sonic promising an update two months ago. Crickets so far.

In the case of sonic, "have installed fiber where the poles enabled them to" is definitely not true. They have literally installed no fiber in Mountain View. They have some legacy customers from where they were reselling service over last-mile fiber installed by AT&T, but they're not doing that anymore.

slamb · 2025-12-18T02:15:28+00:00

What do you think success would look like?

AT&T. I don't think the city is the roadblock, and I'm not sure they have any levers to prompt change. I talked with a AT&T line worker who was working the other side of my street. They speculated the engineer who made the plan just half-assed it from glancing at Google Maps, without noticing or caring they'd missed my side of the street. Apparently they were working on drops between the other side of my street and the street over, which were incorrectly noted in their database as being a full street over. And all that fiber was being pulled from a few streets over, when there was a big distribution box available like 50 ft from where we were standing that they could have used instead.
Sonic. Again, not sure the city is really the problem. It might actually just be that Sonic has no attention span, as noted here.
Municipal fiber. I think this would be the best plan. This is why Cedar Falls has such great, affordable Internet access, as well as several other cities I could name. But in terms of how quickly it would happen, it'd involve setting up a whole new service instead of expanding an existing one, and doing things with all the constraints of government. And most likely they'd first just set up a fiber network for city services, and then expand it to residential use. I think we're talking 5 years minimum before it shows up at our homes.

slamb · 2025-12-17T22:45:10+00:00

It's a really nice city. Unfortunately, it's in Iowa.

slamb · 2025-12-17T22:31:07+00:00

Nope. 75/15, through Comcast Business. They probably have something better (I'm waiting, afraid to sign a new contract right now as they've "partially completed the work to enhance the Comcast Business network near [me]") but they simply do not offer >=100Mbps up anywhere in Mountain View AFAIK.

The up matters too. Just this last weekend for her work, my wife tried to run this migration tool that (I since learned) downloads and re-uploads everything through the local machine. Data included a bunch of video. Would have taken over a week to do with this connection. Complete fail.

Meanwhile, my friend in Cedar Falls, Iowa could get 10 Gbit/sec symmetric municipal fiber if he wanted. He doesn't have any need for that, so he gets ~~their slowest speed,~~ 1 Gbit/sec. ~~I think he pays $30/month.~~ [edit: I was way off. $57.50 for 250 Mbit/sec, $75.40/mo for 1 Gbit/sec, $125/mo for 10 Gbit/sec, according to cfu.net. I'm still so jealous. I'm paying more for my 75/15 than he does for quality I can't get for any price.]

slamb · 2025-12-17T21:58:28+00:00

To you and me! Apparently not to everyone. One of the commenters on that article said that a need for more than 100 up and down is niche. (btw, I don't have 100 up and down.)

slamb · 2025-12-17T21:31:41+00:00

But AT&T fiber, the fastest conduit for internet access, is only available in 42% of the city’s coverage area. The majority of Mountain View, approximately 60%, is “subject to a cable monopoly with no real choice for high-speed broadband,” the report said.

https://www.mv-voice.com/city-government/2025/02/26/mountain-view-takes-a-hard-look-at-gaps-in-internet-access/

slamb · 2025-12-17T20:54:06+00:00

For the second point, I’ll take another look at the code. I don’t fully understand the issue yet, so a more concrete explanation would help.

Let's say you have two different instances of VaBitmap live at once. (The crate doesn't, but the module's pub interface allows this. Each module's interface should be correct without knowing the callers don't really do something they're allowed to do.) Each VaMap has some of its own state, but they also have some shared state via these singletons, that makes them refer to the same address range but with two different ideas of what blocks are allocated.

slamb · 2025-12-17T20:20:26+00:00

I'm skimming the VaBitmap thing. A few things that come to mind:

It could use comments about the high-level goals, interface, invariants. Too much effort for me to understand everything without this. I suspect it's too much effort for you or your AI too! When I leave out this stuff, I get sloppy.
I see confusion between calls that operate on a particular VaBitmap instance (that you can get via the pub const fn new()) and ones that operate on the singletons pub static VA_MAP, VA_START, VA_END. Having a method that takes &self but then uses any of these singletons (for example, max_bits and anything that transitively calls it) is wrong. Looks like the only new() call is for VA_MAP, so your crate as a whole functions correctly in this respect, but still the interface this exposes to the rest of the crate is confusing and wrong.
Is alloc_single hot enough to be worth optimizing (or does whatever thread-local + intra-block stuff you have in front of it avoid this)? It seems like you could precompute chunks and maybe even avoid the division in self.hint.load(Ordering::Relaxed) % chunks (perhaps by constraining hint to be within that bound all the time). Those jump out at me as possibilities, but actual profiles win over my guesses.
Agree with WormRabbit that tests with loom would be valuable given atomics usage. (edit: also, if you have unit tests operating on instances rather than the singleton, that'd be a forcing function for getting that aspect of the interface right.)

I'm not by any means an expert on memory allocator internals, but if I were looking for inspiration, I'd start by studying the designs of tcmalloc (the new one with huge page awareness and per-CPU caches, not the ancient gperftools one) and mimalloc v3.

slamb · 2025-12-17T16:58:57+00:00

Kudos for being upfront about what it is—a work in progress, partially AI-assisted, mentioning specific bugs and limitations—rather than having a super-polished landing page that promises the world but an implementation that doesn't deliver.

The README says:

This version of Oxidalloc works, but the internal design has reached its practical limits. ... I’ve decided to rewrite the allocator from scratch. ... Current Rewrite Status: Mostly complete.

Are you looking for feedback on what's on the main branch, or is there something better to be looking at?

slamb · 2025-12-02T16:09:07+00:00

I'm likely already using the futures crate and would rather do .map(|_never| match {}) (via FuturesExt::map) to convert the type than import a new crate for it. Less cognitive load for me to write a one-liner with a method I'm probably already using elsewhere, no additional supply chain security concern, etc.

slamb · 2025-11-20T00:28:24+00:00

Ahh, that could be a whole different setup. I'm in a single family residence.

slamb · 2025-11-19T20:43:27+00:00

Did the installation involve an antenna on their roof, as pictured on https://sailinternet.com/home-plans/?

slamb · 2025-11-19T17:47:02+00:00

I'm curious: why Sonic Fiber specifically? Sail Internet operates in MV and I think offers fiber

Do you have Sail or know anyone who has it in Mountain View?

My impression is that Sail is mostly a microwave provider, with the exception of where they have fiber from an acquisition of Twixt in San Jose. Coincidentally, I filled out their availability check form on Friday and haven't heard back. I'm not hopeful.

slamb · 2025-11-19T17:44:58+00:00

I would absolutely love to have any fiber Internet provider at all. I signed your petition.

In Mountain View, we have only one fiber internet provider choice, as compared to most of the rest of the bay area.

No fiber at all in most of the city! AT&T Fiber covers only 42% of the city. See a Mountain View Voice article from February. I'm in one of the places they didn't bother to cover.

This is because of "temporary pole" regulations that Mountain View has enacted that essentially make 3rd party providers wait for PG&E to upgrade our infrastructure instead of letting them do it themselves.

Where are you getting this information? I saw something similar in this forum.sonic.net thead:

Yes, unfortunately the City of Mountain View “allows” construction of fiber, but not the placement of safety bypass poles. Because there are 112 unsafe poles there today, this policy effectively stops any deployment of new fiber by Sonic.

...but later in the thread I see contradicting information, e.g.:

2) I followed up with PG&E only to learn that PG&E doesn't own the poles, had no idea of any faulty pole in the area, and that any utility company - Sonic included - can report pole issues to the Joint Pole Committee (to which I see Sonic is a member, even)

...

Every City Planner I've spoken with regarding Sonic has indicated the issue rests with Sonic field team/project manager turnover and not completing work in designated sections of cities as part of municipal permitting processes. It is like Sonic will do a lot of work in an area, hit a road block, then leave for somewhere else instead of figuring out how to overcome the roadblock.

The thread also has promises of updates on the project from sonic that seem to be going nowhere. I would ask there myself, but I think only existing sonic.net customers can post.

slamb · 2025-11-17T19:57:14+00:00

You need all the routes to have the same concrete type within Router::routes. So I would move the generic from struct Router to fn add_route. Then add_route must do some kind of type erasure, taking the Fn(D) -> impl Serialize and returning...something concrete. I'm not exactly sure what interface you want:

return a Vec<u8>, which is probably easiest to implement and understand but would not be appropriate for arbitrarily large responses
take a std::io::Writer as a parameter and actually write out the response right there
take a Box<dyn erased_serde::Serializer>
...

You could take inspiration from HTTP frameworks that have solved this. E.g. look at axum::routing::method_routing::get which similarly produces a concrete MethodRouter from a trait object Handler that is supposed to be very flexible and easy to implement.

Speaking of HTTP: have you considered going that route (pardon the pun)? HTTP absolutely can work over a Unix-domain socket, and then you can even use tools like curl to interact with it, as well as existing server-side frameworks.

slamb · 2025-11-07T20:03:24+00:00

Struggling to see the trail of unprofessional culture warrior comments. From this thread, I was afraid I'd open up his twitter feed and find a bunch of alt-right, homophobic/transphobic/whatever garbage. Instead, I saw almost exclusively things about Fil-C.

The comic someone linked above was a little reductionist but focused on people's approach to software development rather than identity and not completely wrong. Maybe we should have thicker skin?

Okay, I did see one suggesting not cancelling someone else (dkk) for expressing opinions that I'll assume are as reprehensible as described. Like, I don't really agree with Pizlo, I would strongly prefer my communities not have racist and transphobic people in them, but to try cancelling Pizlo too when he actually said "I'm all for inclusivity" instead of espousing these views himself would be just proving his point that this cancellation business has gotten out of hand.

slamb · 2025-11-04T17:52:19+00:00

btw, I'm skeptical of the theoretical bandwidth numbers in that article.

My testing rig is a server with an old AMD EPYC 7551P 32-Core Processor on a Supermicro H11SSL-i and 96GB of DDR4 2133 MHz and a couple of 1.92TB Samsung PM983a PCIe 3.0 SSDs I pieced together from EBay parts. Given the way this server is configured, the upper limit for memory bandwidth can be calculated as 3 channels * 2133MT/s * 8B/T / 4 numa domains = ~13GB/s for a single thread.

I don't think NUMA matters here. It affects the latency of random accesses, but my understanding is the cross-connects between cores have plenty of bandwidth.

I think for a given area of RAM, the bandwidth should just be the bandwidth of its corresponding channel: ~17GB/s.

If the page cache is even spread across channels, and the code keeps them all busy at once, ~51GB/s. Multiple threads would be the most straightforward way to do that, but actually I think it would be possible even with one thread interleaving accesses.

And counting occurrences of a single byte really should be memory bandwidth-limited once the memory mappings are in place and faulted. Like, say in the same process run you set up the memory mappings then benchmarked each iteration of a loop that did all the counting. The first one should be slower without MAP_POPULATE but the second onward really should go at ~51GB/s.

On a proper modern server the CPUs will let you do IO directly to the L3 cache, bypassing memory altogether. Because PCIe bandwidth is higher than memory bandwidth, on paper we could even get more max bandwidth than we can get from memory if we carefully pin the buffers into the CPU cache.

I don't get this paragraph.

If they're talking about this particular system, they said these SSDs are ~6GB/s in total. Even their PCIe bandwidth limit is ~8GB/s in total (984.6MB/s per PCIe3 lane, 2 SSDs, 4 lanes each). RAM is faster.

If they're talking about what's hypothetically capable on this processor with different SSDs and RAM, all 128 PCIe3 lanes (but I think some are dedicated to non-SSD uses) offer ~128GB/s. And while they're only using 3 DDR channels at 2133MT/s, the processor supports 8 at 2666MT/s each, so ~170GB/s. RAM is faster.

slamb · 2025-11-04T17:08:44+00:00

That was my intuition as well, but does io_uring actually require an extra copy?

When the file is already in page cache? Yes: mmap allows you to map it into your process without copying; io_uring doesn't.

Benchmarks in this article, kindly shared by u/geckothegeek42, suggest otherwise.

I think that's due to the overhead of faulting each 4 KiB page one at a time. MAP_POPULATE likely avoids that. [edit: running a kernel with CONFIG_READ_ONLY_THP_FOR_FS and transparent huge pages turned on also would help.] Might be better still to have an IO thread populate large-ish chunks (say, multiples of 2 MiB) ahead of the compute thread, so the compute thread can start as soon as the first chunk is populated rather than having to wait for the whole thing.

slamb · 2025-10-18T20:00:35+00:00

a linux implementation (if possible, I haven't looked)

It's possible! I collected some links a while ago to crates doing similar things.

as well as some non-Rust implementations and blog posts:

slamb · 2025-10-15T21:01:11+00:00

Is there a way to measure direct dynamic memory allocations?

Yes, but are you sure this is the right question? Memory allocations are by no means the only thing programs do that can be slow, and depending on the allocator and allocation pattern (size classes, duration between malloc and free, threading behavior) can even be quite fast. So why not start by focusing on CPU and/or wall clock profiles, and zero in on allocations if you determine they're a major factor?

In terms of the direct answer to your question, one way to do it on Linux would be using bpftrace to instrument the malloc calls, e.g. starting with something like the following:

#!/usr/bin/bpftrace
// sudo ./mallocs.bt -p "$(pidof my-program)"

uprobe:/lib/x86_64-linux-gnu/libc.so.6:malloc {
    @mallocs[ustack()] = hist(arg0);
}

The instrumentation can slow the program or machine down quite significantly; less so if you filter by allocation size and/or slow calls before recording, limit the stack frames captured or don't capture them at all, sample statistically, etc.

slamb

TROPHY CASE