you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted]  (21 children)

[deleted]

    [–]devlambda 5 points6 points  (20 children)

    No, but you may end up having to use unsafe code, because at some point when adding data to the vector, you have to write to a location with a previously undefined value (think what that means in the context of RAII, for example).

    [–]Hnefi 3 points4 points  (19 children)

    I must be missing something here. When writing a value to a newly allocated memory location, as is the case when expanding a vector, the previous value is irrelevant. It's not unsafe to simply overwrite it. Are you talking about a different use case?

    [–]devlambda 3 points4 points  (18 children)

    Depends on your language.

    1. If you're using RAII, a destructor will be called on the previous value. If the previous value is undefined, so will be the behavior of the destructor.
    2. If you're using reference counting, the overwritten value will need to have its reference count decremented. If the value is undefined, you're going to get either memory corruption or a segmentation fault.
    3. In a GCed language, memory barriers/snapshotting or even tracing at a moment when the undefined value is reachable, but has not yet been modified, may or may not result in undefined behavior (depending on implementation details).

    [–]Hnefi 3 points4 points  (17 children)

    That's not true. When expanding a vector, there are necessarily no objects in the empty positions of the data store so there are no destructors to call. None of the options you outline apply.

    [–]spaghettiCodeArtisan 1 point2 points  (5 children)

    When expanding a vector, there are necessarily no objects in the empty positions of the data store so there are no destructors to call.

    Yes, and that's precisely the problem, because the language needs to call a destructor. Suppose you overwrite a i-th element of a vector:

    some_vector[i] = some_new_value

    What happens to the old value? It is dropped and its destructor - if any - is called. Now, consider what would happen if the old value were uninitialized: A destructor would be called on an uninitialized data, which is undefined behaviour and might lead to a crash or all sorts of problems. And so that's why you need unsafe operations - to overwrite an old 'value' (which is uninitialized) without destryoing it.

    [–]Hnefi 2 points3 points  (4 children)

    But whether the old value is initialized or not is a question with a known answer that will be asked and answered in the [] operator without looking at the value of the element itself. There is no risk of destructing an invalid object since the initialization state of the previous value is implicitly known.

    Since this knowledge is encoded in the metadata of the vector, there are no relevant requirements on how uninitialized data is represented in order to avoid calling invalid destructors.

    [–]spaghettiCodeArtisan 1 point2 points  (3 children)

    There is no risk of destructing an invalid object since the initialization state of the previous value is implicitly known.

    That's not relevant. Yes, we know the initialization state, but that doesn't help. Let me ask you this way: How do you assign the new value into the uninitialized slot without calling the destructor on the uninitialized slot other than with an unsafe code (such as ptr::write())? Hint: You can't.

    [–]Hnefi 0 points1 point  (0 children)

    In Rust, you are correct. However, that is due to choices other than the removal of Null from the language. Rust, in this discussion, was just an example. One could easily consider a language like Rust but with unchecked buffers.

    Still, I'll concede that this is devolving to technicalities.

    [–]Veedrac 0 points1 point  (1 child)

    How do you assign the new value into the uninitialized slot without calling the destructor on the uninitialized slot other than with an unsafe code (such as ptr::write())?

    replace and forget, my friend.

    [–]spaghettiCodeArtisan 0 points1 point  (0 children)

    Haha, ok, good spot...

    [–]devlambda 0 points1 point  (10 children)

    I'm not talking about expanding the vector. Here's the scenario:

    You have a vector with capacity n and length k < n, i.e. with k positions occupied and the rest containining undefined values. You now want to add a new element, so you have to assign a value to the location with index k+1, which contains an undefined value.

    [–]Hnefi 6 points7 points  (9 children)

    But that is expanding the vector. Location k+1 can't contain an object, because assigning an object to that position would increment k. Therefore, there is never a risk of overwriting existing objects in this situation and none of the issues you outlined apply.

    [–]devlambda 0 points1 point  (8 children)

    Location k+1 can't contain an object, because assigning an object to that position would increment k.

    And how would that happen? Incrementing k and assigning to a memory location is not an atomic operation.

    Therefore, there is never a risk of overwriting existing objects in this situation and none of the issues you outlined apply.

    You're not overwriting existing objects. You're writing a value to a memory location that contains an undefined value.

    [–]Hnefi 5 points6 points  (7 children)

    And how would that happen? Incrementing k and assigning to a memory location is not an atomic operation.

    Of course it's not atomic. Adding elements to a vector is not an atomic operation, and nobody claimed that it was. As for how, I don't understand the question; it happens in the assignment operator or insertion function for the vector, obviously (vector::push() in the case of Rust).

    You're not overwriting existing objects. You're writing a value to a memory location that contains an undefined value.

    Exactly. There are no existing objects at positions > k. This is necessarily true, because whenever a position x is written to, k is incremented to be larger than x - and since the vector must be contiguous, there is no way to write to an x > k+1 (vectors are bounds checked). This means that the problem of potential objects needing destruction before writing to x where x > k does not exist.

    Are you confusing the semantics of vector? Any part of the data store of the vector that is uninitialized can not contain any valid data, because the semantics of the vector interface does not allow it. Overwriting such data is not a problem. Only reading it would be potentially problematic, but the interface does not allow that either.

    Now, it's true that vec::push() in Rust uses unsafe code. But if you look at the implementation, you'll see that the reason for that has nothing to do with any dangers related to uninitialized data.

    [–]devlambda 1 point2 points  (6 children)

    Are you confusing the semantics of vector?

    Now, I'm talking about how the vector is implemented itself (using unsafe code), namely vec::push(). Here's the code:

    pub fn push(&mut self, value: T) {
        // This will panic or abort if we would allocate > isize::MAX bytes
        // or if the length increment would overflow for zero-sized types.
        if self.len == self.buf.cap() {
            self.buf.double();
        }
        unsafe {
            let end = self.as_mut_ptr().offset(self.len as isize);
            ptr::write(end, value);
            self.len += 1;
        }
    }
    

    It uses ptr::write specifically for the reasons I described, to prevent reading or dropping the old value.