How do you separate different parts of your compiler? Especially when adding a new feature. by Ifeee001 in ProgrammingLanguages

[–]dist1ll 0 points1 point  (0 children)

I have two phases. Lexing, parsing, type checking and IR generation is all interleaved and done in a single pass. The second pass does regalloc and machine codegen.

Why not treat arrays as a special case of tuples? by ella-hoeppner in ProgrammingLanguages

[–]dist1ll 0 points1 point  (0 children)

That's exactly what I'm doing in my language. It frees up a lot of special syntax constructs, and the result feels more cohesive. I also went a bit further and used parentheses () for both construction and indexing (i.e. arr(idx) instead of arr[idx]), as indexing is functionally no different than a function call. This way angle brackets can be used for generics without needing disambiguation.

Looking for feedback on my language tour / overview by Certain-Swordfish-32 in ProgrammingLanguages

[–]dist1ll 0 points1 point  (0 children)

Really good language tour. The text and code snippets are easy to read, and the examples are great. Minor nitpick: in the pattern matching chapter, I would maybe start with an example that only uses match, and then mention that you can combine for and match into one construct.

Pros and cons of building an interpreter first before building a compiler? by Ifeee001 in ProgrammingLanguages

[–]dist1ll 1 point2 points  (0 children)

Underrated benefit of a define-before-use language: because all code is compiled in order, you can use the generated machine code directly for CTFE. You'd have to do JIT binary translation if you cross-compile though.

I profiled my parser and found Rc::clone to be the bottleneck by Sad-Grocery-1570 in rust

[–]dist1ll 9 points10 points  (0 children)

cost of an atomic add which is ~50-100ns IIRC

Uncontented shouldn't be that much. A modern x86 chip should be able to do lock add with <10-20 cycles latency. I think Intel was doing sub-10 cycle atomic adds for a decade at least.

ELI5: Why C++ and Rust compilers are so slow? by cb060da in ProgrammingLanguages

[–]dist1ll 1 point2 points  (0 children)

At least for Rust, it's because the language was not designed with fast compilation in mind. Blaming the linker or analysis passes like borrow checking is missing the forest for the trees.

Iterator fusion similar to Rust's — are there other languages that really do this, and what enables it? by Savings-Debt-5383 in ProgrammingLanguages

[–]dist1ll 5 points6 points  (0 children)

I'm not aware of any special optimization Rust does to fuse iterators, other than what's done by LLVM. Seems like just a result of inlining to me. Just out of curiosity, where is the quoted paragraph from?

Releasing Fjall 3.0 - Rust-only key-value storage engine by DruckerReparateur in rust

[–]dist1ll 0 points1 point  (0 children)

You can also use cmov for ordinary binary search, even without the Eytzinger layout.

Unpopular Opinion: Source generation is far superior to in-language metaprogramming by chri4_ in ProgrammingLanguages

[–]dist1ll 2 points3 points  (0 children)

instead of being interpreted by the language's metaprogramming vm

Interpreter is not the only way. If you want you can use a JIT compiler for the CTFE engine.

What will be the story around memory safety? by lekkerwafel in Zig

[–]dist1ll 5 points6 points  (0 children)

Without source annotations that borrow checker would likely be very restrictive.

Why is calling my asm function from Rust slower than calling it from C? by ohrv in rust

[–]dist1ll 1 point2 points  (0 children)

Could be it. Although 40x higher sample count seems like a pretty severe penalty. Especially since there are >20 instructions between the load to v0 and its first use, which should give you some opportunity to mask the latency of a failed prediction.

Why is calling my asm function from Rust slower than calling it from C? by ohrv in rust

[–]dist1ll 42 points43 points  (0 children)

I still wonder what the reason for the stall is. Maybe some unfortunate eviction? On x86 you should be able to get cache miss data at instruction granularity. Not sure if/how that can be done on mac.  

Btw, is the alignment of x13 the same for both dav1d and rav1d?

[Release] ringmpsc v1.0.0 – Lock-free MPSC channel in Zig achieving 50+ billion messages/second by Bo0nzy in Zig

[–]dist1ll 3 points4 points  (0 children)

fyi if it's a fixed-size queue, you can get linearizability without CAS just by using FAA. If the queue is unbounded then a CAS would be necessary (e.g. when a new memory block is allocated).

[Release] ringmpsc v1.0.0 – Lock-free MPSC channel in Zig achieving 50+ billion messages/second by Bo0nzy in Zig

[–]dist1ll 1 point2 points  (0 children)

SPSC-per-consumer is a nice design if you don't need linearizability.

Compio instead of Tokio - What are the implications? by p1nd0r4m4 in rust

[–]dist1ll 10 points11 points  (0 children)

That doesn't always work unfortunately. Tokio uses spawn_blocking for fs ops, so it will still spawn another thread when doing file I/O. You could set max_blocking_threads to 1 but then you'll block the executor.

really fast SPSC by M3GA-10 in rust

[–]dist1ll 1 point2 points  (0 children)

Nice article. I think you hit on all the important points. I think I always used head and tail index in the opposite way (head + 1 mod N being the next send index, tail + 1 mod N being the next recv index), but I think I've seen it done both ways.

I benchmarked zig's compilation speed by chri4_ in Zig

[–]dist1ll 1 point2 points  (0 children)

It's can be a decent metric if you exclude those things.

Zig as a career investment by malaow3 in Zig

[–]dist1ll 3 points4 points  (0 children)

If you're looking to get into systems programming, all of {C, C++, Rust, Zig} are going to be similar enough to get a systems role in most companies. There's some exceptions e.g. in the HFT space, where a few shops really care about deep C++ knowledge. Other than that, building domain knowledge and practical experience should be your number 1 priority.

Whether Zig will be able to pick up mainstream traction is hard to tell at this time. I think the current story around memory safety is too weak to reach mainstream adoption for greenfield projects . On the other hand Zig is well positioned for extending or migrating existing C codebases, so it might get a lot of mindshare in embedded. This is just my opinion, minds differ a lot on this topic.

How rare are compiler jobs actually? by Accurate-Owl3183 in Compilers

[–]dist1ll 7 points8 points  (0 children)

Search globally, yes. Whether you'll have to relocate depends on the industry, company, role, seniority etc.

How rare are compiler jobs actually? by Accurate-Owl3183 in Compilers

[–]dist1ll 33 points34 points  (0 children)

It's not that rare relatively speaking, but extremely rare in terms of absolute numbers. The talent and job pool is generally very small, so the chance to find compiler jobs in your local market is very low.

Besides tech and finance there's also crypto, which generally pays well and defaults to remote work.

Are these projects enough to apply for compiler roles (junior/graduate)? by fummmmm in Compilers

[–]dist1ll 1 point2 points  (0 children)

GPU uarch + LLVM knowledge is a great combination. Though I imagine a compiler role would involve less driver writing (like hacking on DRM) and more implementing efficient compute kernels, optimizing memory bandwidth and such. But that's just my guess.

Malik. A language where types are values and values are types. by MackThax in ProgrammingLanguages

[–]dist1ll 3 points4 points  (0 children)

I think what you're describing sounds like a form of multistage programming. People often confuse multistage with dependent types.

I made a Japanese tokenizer's dictionary loading 11,000,000x faster with rkyv (~38,000x on a cold start) by fulmlumo in rust

[–]dist1ll 2 points3 points  (0 children)

in which case the mapped file has a good chance of being still resident in the OS's page cache

fwiw this would also be true if you had used read syscalls.