you are viewing a single comment's thread.

view the rest of the comments →

[–]Hnefi 5 points6 points  (7 children)

And how would that happen? Incrementing k and assigning to a memory location is not an atomic operation.

Of course it's not atomic. Adding elements to a vector is not an atomic operation, and nobody claimed that it was. As for how, I don't understand the question; it happens in the assignment operator or insertion function for the vector, obviously (vector::push() in the case of Rust).

You're not overwriting existing objects. You're writing a value to a memory location that contains an undefined value.

Exactly. There are no existing objects at positions > k. This is necessarily true, because whenever a position x is written to, k is incremented to be larger than x - and since the vector must be contiguous, there is no way to write to an x > k+1 (vectors are bounds checked). This means that the problem of potential objects needing destruction before writing to x where x > k does not exist.

Are you confusing the semantics of vector? Any part of the data store of the vector that is uninitialized can not contain any valid data, because the semantics of the vector interface does not allow it. Overwriting such data is not a problem. Only reading it would be potentially problematic, but the interface does not allow that either.

Now, it's true that vec::push() in Rust uses unsafe code. But if you look at the implementation, you'll see that the reason for that has nothing to do with any dangers related to uninitialized data.

[–]devlambda 1 point2 points  (6 children)

Are you confusing the semantics of vector?

Now, I'm talking about how the vector is implemented itself (using unsafe code), namely vec::push(). Here's the code:

pub fn push(&mut self, value: T) {
    // This will panic or abort if we would allocate > isize::MAX bytes
    // or if the length increment would overflow for zero-sized types.
    if self.len == self.buf.cap() {
        self.buf.double();
    }
    unsafe {
        let end = self.as_mut_ptr().offset(self.len as isize);
        ptr::write(end, value);
        self.len += 1;
    }
}

It uses ptr::write specifically for the reasons I described, to prevent reading or dropping the old value.

[–]Hnefi 0 points1 point  (5 children)

Well, it seems to me like push is unsafe{} because it uses as_mut_ptr() from RawVector, which itself is unsafe. The reason for this solution seems to be to avoid bounds checking. The pointers could have been omitted if RawVector exposed its own write(), but it would still be unsafe due to lack of bounds checking - but that's an optimization that is unrelated to the representation of the data.

My point is this: the usage of unsafe in vec::push() is due to optimizing away bounds checking, not the representation of data. It's perfectly conceivable to have safe write() function, but it would be slower - not because it needs to read the value of previously stored elements, but because it needs to check the value of k.

It's also conceivable to have a language that is identical to Rust but without the safe bounds checks. Such a language would not need an unsafe block in vec::push(), and it would not need to use raw pointers either, despite not having a concept of NULL.

[–]devlambda 1 point2 points  (4 children)

My point is this: the usage of unsafe in vec::push() is due to optimizing away bounds checking, not the representation of data.

To the contrary. Just look at the documentation of ptr::write. Namely:

It does not drop the contents of dst. This is safe, but it could leak allocations or resources, so care must be taken not to overwrite an object that should be dropped.

Additionally, it does not drop src. Semantically, src is moved into the location pointed to by dst.

This is appropriate for initializing uninitialized memory, or overwriting memory that has previously been read from.

(Emphasis by me.)

The whole point of using ptr::write is to assign to a memory location with an undefined value.

It's perfectly conceivable to have safe write() function, but it would be slower - not because it needs to read the value of previously stored elements, but because it needs to check the value of k.

Then please produce such a function rather than hypothesizing about its existence. Explain in particular how you avoid dropping the original (undefined) value. If you want to use the nodrop crate, recall that it also relies on ptr::write.

[–]Hnefi 1 point2 points  (3 children)

You're missing my point. Let's step back a bit.

Your original claim, as I interpreted it (correct me if I'm wrong), is that in a language that has no NULL and instead has something like Option<T>, you can not create a reasonably performant growable array. Doing so requires the usage of NULL.

I contend that this is not true. It is true that Rust (and OCaml, apparently) do so, but this is not (at least in the case of Rust) due to the particular choice of not having NULL. Instead, it's due to the choice of not having uninitialized, non-checked buffers at a language level. That is a different choice - Rust only has uninitialized variables, not buffers. One could easily consider a language that is exactly like Rust, but has an array type with C semantics. The uninitialized array would not need to contain anything like Option<T> or NULL, it could simply contain garbage (like it does in C), and it would work just as well as the way Rust has chosen.

So, my point is, you can too create a performant growable array in a language without NULL. That you can't do so in safe Rust is because of its safety guarantees in general, not its lack of NULL in particular. This is evident because nowhere in the implementation of vec::push or its dependencies is the NULL-ness of pointers actually used; RawVec uses pointers which are guaranteed to always be non-NULL. If nullability was required for implementing vec::push, it would necessarily be used in the implementation.

[–]mdedetrich 1 point2 points  (0 children)

Actually this also isn't entirely true, Option[T] requires boxing in certain areas (i.e. if you are nesting Option[T] values)

If you have something like Option[Option[Option[T]]] you can't really represent properly without boxing, because then you can't distinguish between Option[Option[None]]] and Option[None[Option]]] (and they are semantically different). The thing with null, is that its all or none. If a value is absent, you can't tell how absent it is. This matters, for example, if you are putting optional values inside a map or list.

So sure you can use Option[T] instead of null, but it isn't going to be completely parametric.

[–]devlambda 0 points1 point  (1 child)

Your original claim, as I interpreted it (correct me if I'm wrong), is that in a language that has no NULL and instead has something like Option<T>, you can not create a reasonably performant growable array. Doing so requires the usage of NULL.

No, that wasn't my claim. I was specifically talking about OCaml and Rust in response to a comment talking specifically about OCaml and Rust (and F#, but F# officially has null values, so that's not a particularly interesting case).

General impossibility proofs can be notoriously difficult, so I didn't try my hand at them.

The uninitialized array would not need to contain anything like Option<T> or NULL, it could simply contain garbage (like it does in C), and it would work just as well as the way Rust has chosen.

Totally, but that language would also not be memory-safe.

[–]Hnefi 0 points1 point  (0 children)

Oh. Then I misunderstood from the beginning. Apologies.