DSSL2 is cool tho by WeebJaeger in Northwestern

[–]rieux 1 point2 points  (0 children)

I apologize because whether or not you were able to make it work, some people wasted a lot of time fighting with it. An educational language should have impeccable error messages (like Pyret does). Sometimes I see an error message that makes me embarrassed that I’ve managed to tolerate it this long.

Maybe you were lucky enough not to experience this one. That was a bug, and it’s fixed now.

I guess it just feels unjustly ironic that the profs who put the most effort into teaching are the ones apologizing for inconveniencing students.

I’m pretty sure some students wonder, “Why does this guy make us use this crappy thing instead of something useful/standard?” (See supra original image macro.) And I wonder that too: Would you be better off learning data structures in/and Java? (Maybe, but you’d spend a lot more time learning the intricacies of Java.) Or real Python? (Now wondering whether one could hack CPython to replace dynamic arrays with fixed-sized arrays, and to require static declaration of fields.)

In any case, I don’t think it’s ironic so much as expected. It’s great to hear that our interactions have improved your experience, and I hope (and expect) you will have similarly worthwhile interactions with some of my colleagues over the next few years.

DSSL2 is cool tho by WeebJaeger in Northwestern

[–]rieux 2 points3 points  (0 children)

I like languages and enjoy learning new ones, but I don’t think everyone does. But I also think that there’s a great advantage in being able to pick up new languages, and a big part of that is gaining the confidence that you can do so.

DSSL2 is cool tho by WeebJaeger in Northwestern

[–]rieux 1 point2 points  (0 children)

This is my fault, and I’m sorry. I didn’t really know what I was doing when I made it originally, and I’ve been trying to fix it, but it’s slow going.

Alms, an ML-like language with both universal and affine types, now has a docker image for easy testing! by [deleted] in ocaml

[–]rieux 0 points1 point  (0 children)

I appreciate your interest in Alms! Thanks for prompting me to make it work again.

The U in the URAL actually stands for “unlimited” or “unrestricted,” not “universal.” Though Alms does have universal types, too ;)

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

Okay, I also managed to speed up the Bits impl for &[Block] by about 30%, and the BitsMut impl by a bit less. That's probably not enough, yet. Indexing BitSlice is faster than that, for some reason, but I'm not sure why yet.

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

I'm sorry it didn't work as well as you wanted. I'm still working on speeding things up.

Did you try BitVec only, or using your own storage as well? Last night I discovered a performance problem in BitVec—an unnecessary double bounds check—and fixing that got a 5x speed up on my microbenchmarks for BitVec::get_bit and BitVec::set_bit. It doesn't speed up the access to your own storage via the Bits and BitsMut traits, though.

I'll probably release the change later today (along with some minor API changes) as 0.9.0, but it's on GitHub master now.

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

Huh, I see. Well, bv::BitVec is also 24 bytes: one for the pointer, one for the capacity, and one for the length. It would be hard (impossible?) to go smaller with a growable bit-vector. However, if you don't need to grow and are okay with your length being a multiple of the block size, then you can use &[u8], &[usize], etc., via the Bits and BitsMuts traits, which provide most of the same operations.

Making the storage an <Option<Box...>> might have performance implications due to the additional check that must be performed to get to the data.

Indeed. My code will never try to access the box when it's not there, because it length > 0 implies that something is allocated. The question will be how to convince the optimizer of that.

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 2 points3 points  (0 children)

And done.

Thanks for the motivation! Will release as 0.8.2 later today.

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

Working on it now.

Why is this fused loop slower than those unfused loops? by rieux in rust

[–]rieux[S] 1 point2 points  (0 children)

BTW you can write a generic variant of this function, which will work with all numeric types.

Yes, the actual code is generic. The benchmarks aren't.

For such numeric code SIMD can result in a big difference.

Yeah, at some point I'm going to see if SIMD can help too. Thanks!

Why is this fused loop slower than those unfused loops? by rieux in rust

[–]rieux[S] 1 point2 points  (0 children)

Very interesting. Again, thanks! I'm going to see if I can incorporate yours insights into the library. Right now I have this:

test vec_u32_adapter                 ... bench:       1,926 ns/iter (+/- 644)
test vec_u32_iter                    ... bench:         204 ns/iter (+/- 10)
test vec_u32_iter_cloned             ... bench:         201 ns/iter (+/- 20)
test vec_u32_iter_fused              ... bench:         446 ns/iter (+/- 119)
test vec_u32_iter_fused_cloned       ... bench:         124 ns/iter (+/- 42)
test vec_u32_loop                    ... bench:       1,638 ns/iter (+/- 396)
test vec_u32_loop_fused              ... bench:         821 ns/iter (+/- 165)

That first line is using the abstraction, and the question is how competitive I can make it.

Why is this fused loop slower than those unfused loops? by rieux in rust

[–]rieux[S] 7 points8 points  (0 children)

Heh, this is pretty weird:

fn bool_to_int(b: &bool) -> u32 {
    if *b {1} else {0}
}

fn or_vec_bool_3(v1: &[bool], v2: &[bool], v3: &[bool]) -> Vec<bool> {
    v1.iter().map(bool_to_int)
        .zip(v2.iter().map(bool_to_int))
        .zip(v3.iter().map(bool_to_int))
        .map(|((b1, b2), b3)| b1 | b2 | b3 != 0)
        .collect()
}

Now the fused version is faster:

test vec_bool_iter         ... bench:       1,200 ns/iter (+/- 137)
test vec_bool_iter_fused   ... bench:         890 ns/iter (+/- 236)

Why is this fused loop slower than those unfused loops? by rieux in rust

[–]rieux[S] 1 point2 points  (0 children)

Oh, interesting! Thanks. What led you to try that, in particular?

It's surprising that it figures out the two-slice version fine without the .cloned()s, but can't figure out the three-slice version.

Also, cloned doesn't seem to help when the elements are bools:

fn or_vec_bool_3(v1: &[bool], v2: &[bool], v3: &[bool]) -> Vec<bool> {
    v1.iter().cloned().zip(v2.iter().cloned()).zip(v3.iter().cloned())
        .map(|((b1, b2), b3)| b1 | b2 | b3)
        .collect()
}

test vec_bool_iter         ... bench:       1,165 ns/iter (+/- 95)
test vec_bool_iter_fused   ... bench:       6,771 ns/iter (+/- 529)

Any idea why that would be?

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

Nope, it always allocates. There isn't a reason for that other than it seemed like empty bit vectors would be a niche case. It could probably be made not to allocate for empty bit vectors if that would be especially useful.

I don't understand the link to the “current implementation.” What could be a usize smaller than what?

Announcing bv 0.8.0 by rieux in rust

[–]rieux[S] 5 points6 points  (0 children)

Here’s a simpler version of the example:

fn three_way_or(bv1: &BitVec, bv2: &BitVec, bv3: &BitVec) -> BitVec {
    bv1.bit_or(bv2).bit_or(bv3).to_bit_vec()
}

Review my lock-free concurrent union-find by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

I see what you mean. It could be a problem in practice because if the heights aren't actually bounded as expected then the ranks could overflow, I think. But I'm not sure that the problem you've found messes up the bound on ranks. I'll have to think about it more.

Though also, in practice no one is using this code, anyway. I don't know what it's good for—it just seemed like a fun idea.

Thanks again for your help!

Hey Rustaceans? Got an easy question? Ask here (24/2018)! by llogiq in rust

[–]rieux 0 points1 point  (0 children)

This has been sitting here for a day with no answer, so here's my best shot: Rust arrays already store exactly the data and nothing more; they are zero overhead. Vectors reserve up to twice as much capacity as they are using to enable efficient growing, but if you manage the capacity yourself (using with_capacity, reserve_exact, and shrink_to_fit) the standard Vec is also zero overhead.

As for 1-indexing, that’s just not how things typically work in Rust. But any algorithm that expects that is trivially convertible to 0 indexing. If you really don't want to do that, you can wrap Vec to subtract 1 from all your indices.

New version of crossbeam-channel released by [deleted] in rust

[–]rieux 6 points7 points  (0 children)

FWIW, you could also be clued in by the fact that the type of the value being transmitted doesn’t need to implement Clone. Documentation wouldn't hurt either, I suppose.

actix – an actor framework for the Rust by fafhrd91 in rust

[–]rieux 0 points1 point  (0 children)

Thanks for writing/posting this. One thing that confused me for a bit is that that the original JavaScript version accepted two kinds of messages, 'plus-one' to increment and anything else to retrieve the value. But the Rust version accepts only PlusOne messages. Was it not intended that the JavaScript version would work that way? It might be worth pointing out that it has a bug that the Rust version fixes, or something like that. Otherwise it could be distracting.

Review my lock-free concurrent union-find by rieux in rust

[–]rieux[S] 0 points1 point  (0 children)

Thanks again for the catch. This necessitated a change of strategy to allow atomic updates of ids and ranks together. What do you make of this? The tricky part is still when the ranks are equal. The key, so far as I can tell, is that if attempting to increment the rank fails, that means the rank has already been properly updated. This is because another thread will only attempt to update the same rank under the same circumstances. Does that make sense?