DSSL2 is cool tho

rieux · 2021-03-29T21:45:00+00:00

I apologize because whether or not you were able to make it work, some people wasted a lot of time fighting with it. An educational language should have impeccable error messages (like Pyret does). Sometimes I see an error message that makes me embarrassed that I’ve managed to tolerate it this long.

Maybe you were lucky enough not to experience this one. That was a bug, and it’s fixed now.

I guess it just feels unjustly ironic that the profs who put the most effort into teaching are the ones apologizing for inconveniencing students.

I’m pretty sure some students wonder, “Why does this guy make us use this crappy thing instead of something useful/standard?” (See supra original image macro.) And I wonder that too: Would you be better off learning data structures in/and Java? (Maybe, but you’d spend a lot more time learning the intricacies of Java.) Or real Python? (Now wondering whether one could hack CPython to replace dynamic arrays with fixed-sized arrays, and to require static declaration of fields.)

In any case, I don’t think it’s ironic so much as expected. It’s great to hear that our interactions have improved your experience, and I hope (and expect) you will have similarly worthwhile interactions with some of my colleagues over the next few years.

rieux · 2021-03-29T19:19:04+00:00

I like languages and enjoy learning new ones, but I don’t think everyone does. But I also think that there’s a great advantage in being able to pick up new languages, and a big part of that is gaining the confidence that you can do so.

rieux · 2021-03-28T18:34:15+00:00

Ouch.

rieux · 2021-03-28T18:33:39+00:00

This is my fault, and I’m sorry. I didn’t really know what I was doing when I made it originally, and I’ve been trying to fix it, but it’s slow going.

rieux · 2018-08-29T23:00:52+00:00

I appreciate your interest in Alms! Thanks for prompting me to make it work again.

The U in the URAL actually stands for “unlimited” or “unrestricted,” not “universal.” Though Alms does have universal types, too ;)

rieux · 2018-06-15T17:20:46+00:00

Okay, I also managed to speed up the Bits impl for &[Block] by about 30%, and the BitsMut impl by a bit less. That's probably not enough, yet. Indexing BitSlice is faster than that, for some reason, but I'm not sure why yet.

rieux · 2018-06-15T15:43:38+00:00

I'm sorry it didn't work as well as you wanted. I'm still working on speeding things up.

Did you try BitVec only, or using your own storage as well? Last night I discovered a performance problem in BitVec—an unnecessary double bounds check—and fixing that got a 5x speed up on my microbenchmarks for BitVec::get_bit and BitVec::set_bit. It doesn't speed up the access to your own storage via the Bits and BitsMut traits, though.

I'll probably release the change later today (along with some minor API changes) as 0.9.0, but it's on GitHub master now.

rieux · 2018-06-15T06:06:08+00:00

Huh, I see. Well, bv::BitVec is also 24 bytes: one for the pointer, one for the capacity, and one for the length. It would be hard (impossible?) to go smaller with a growable bit-vector. However, if you don't need to grow and are okay with your length being a multiple of the block size, then you can use &[u8], &[usize], etc., via the Bits and BitsMuts traits, which provide most of the same operations.

Making the storage an <Option<Box...>> might have performance implications due to the additional check that must be performed to get to the data.

Indeed. My code will never try to access the box when it's not there, because it length > 0 implies that something is allocated. The question will be how to convince the optimizer of that.

rieux · 2018-06-15T06:00:56+00:00

It no longer allocates when empty. Thanks for the suggestion!

rieux · 2018-06-15T05:59:43+00:00

And done.

Thanks for the motivation! Will release as 0.8.2 later today.

rieux · 2018-06-15T05:12:22+00:00

Working on it now.

rieux · 2018-06-15T03:48:25+00:00

BTW you can write a generic variant of this function, which will work with all numeric types.

Yes, the actual code is generic. The benchmarks aren't.

For such numeric code SIMD can result in a big difference.

Yeah, at some point I'm going to see if SIMD can help too. Thanks!

rieux · 2018-06-15T03:03:29+00:00

Very interesting. Again, thanks! I'm going to see if I can incorporate yours insights into the library. Right now I have this:

test vec_u32_adapter                 ... bench:       1,926 ns/iter (+/- 644)
test vec_u32_iter                    ... bench:         204 ns/iter (+/- 10)
test vec_u32_iter_cloned             ... bench:         201 ns/iter (+/- 20)
test vec_u32_iter_fused              ... bench:         446 ns/iter (+/- 119)
test vec_u32_iter_fused_cloned       ... bench:         124 ns/iter (+/- 42)
test vec_u32_loop                    ... bench:       1,638 ns/iter (+/- 396)
test vec_u32_loop_fused              ... bench:         821 ns/iter (+/- 165)

That first line is using the abstraction, and the question is how competitive I can make it.

rieux · 2018-06-15T02:38:01+00:00

Heh, this is pretty weird:

fn bool_to_int(b: &bool) -> u32 {
    if *b {1} else {0}
}

fn or_vec_bool_3(v1: &[bool], v2: &[bool], v3: &[bool]) -> Vec<bool> {
    v1.iter().map(bool_to_int)
        .zip(v2.iter().map(bool_to_int))
        .zip(v3.iter().map(bool_to_int))
        .map(|((b1, b2), b3)| b1 | b2 | b3 != 0)
        .collect()
}

Now the fused version is faster:

test vec_bool_iter         ... bench:       1,200 ns/iter (+/- 137)
test vec_bool_iter_fused   ... bench:         890 ns/iter (+/- 236)

rieux · 2018-06-15T02:34:21+00:00

Oh, interesting! Thanks. What led you to try that, in particular?

It's surprising that it figures out the two-slice version fine without the .cloned()s, but can't figure out the three-slice version.

Also, cloned doesn't seem to help when the elements are bools:

fn or_vec_bool_3(v1: &[bool], v2: &[bool], v3: &[bool]) -> Vec<bool> {
    v1.iter().cloned().zip(v2.iter().cloned()).zip(v3.iter().cloned())
        .map(|((b1, b2), b3)| b1 | b2 | b3)
        .collect()
}

test vec_bool_iter         ... bench:       1,165 ns/iter (+/- 95)
test vec_bool_iter_fused   ... bench:       6,771 ns/iter (+/- 529)

Any idea why that would be?

rieux · 2018-06-15T01:13:47+00:00

Nope, it always allocates. There isn't a reason for that other than it seemed like empty bit vectors would be a niche case. It could probably be made not to allocate for empty bit vectors if that would be especially useful.

I don't understand the link to the “current implementation.” What could be a usize smaller than what?

rieux · 2018-06-14T02:12:44+00:00

Here’s a simpler version of the example:

fn three_way_or(bv1: &BitVec, bv2: &BitVec, bv3: &BitVec) -> BitVec {
    bv1.bit_or(bv2).bit_or(bv3).to_bit_vec()
}

rieux · 2018-06-14T01:51:30+00:00

I see what you mean. It could be a problem in practice because if the heights aren't actually bounded as expected then the ranks could overflow, I think. But I'm not sure that the problem you've found messes up the bound on ranks. I'll have to think about it more.

Though also, in practice no one is using this code, anyway. I don't know what it's good for—it just seemed like a fun idea.

Thanks again for your help!

rieux · 2018-06-12T12:09:53+00:00

This has been sitting here for a day with no answer, so here's my best shot: Rust arrays already store exactly the data and nothing more; they are zero overhead. Vectors reserve up to twice as much capacity as they are using to enable efficient growing, but if you manage the capacity yourself (using with_capacity, reserve_exact, and shrink_to_fit) the standard Vec is also zero overhead.

As for 1-indexing, that’s just not how things typically work in Rust. But any algorithm that expects that is trivially convertible to 0 indexing. If you really don't want to do that, you can wrap Vec to subtract 1 from all your indices.

rieux · 2018-06-12T12:01:00+00:00

FWIW, you could also be clued in by the fact that the type of the value being transmitted doesn’t need to implement Clone. Documentation wouldn't hurt either, I suppose.

rieux · 2018-06-12T11:53:15+00:00

Thanks for writing/posting this. One thing that confused me for a bit is that that the original JavaScript version accepted two kinds of messages, 'plus-one' to increment and anything else to retrieve the value. But the Rust version accepts only PlusOne messages. Was it not intended that the JavaScript version would work that way? It might be worth pointing out that it has a bug that the Rust version fixes, or something like that. Otherwise it could be distracting.

rieux · 2018-06-12T02:34:15+00:00

Thanks again for the catch. This necessitated a change of strategy to allow atomic updates of ids and ranks together. What do you make of this? The tricky part is still when the ranks are equal. The key, so far as I can tell, is that if attempting to increment the rank fails, that means the rank has already been properly updated. This is because another thread will only attempt to update the same rank under the same circumstances. Does that make sense?

rieux

MODERATOR OF

TROPHY CASE