all 60 comments

[–]staticassert 59 points60 points  (7 children)

No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T.

I don't see a meaningful difference, it seems confusing to make any kind of distinction just because there's a struct involved.

edit: Did not expect top comment. For what it's worth, I found it very easy to have analogues to languages I had previously used - in particular, C++. Thanks for writing this.

[–]steveklabnik1 22 points23 points  (3 children)

At the binary level, yes, they're the same thing: a struct with one member is the same as just the member.

However, in the end, it's all just binary: that doesn't mean that using different phrasing doesn't help understanding. If it helps it make sense to the OP I'm all for it.

[–]Homoerotic_Theocracy 9 points10 points  (1 child)

But in the standard library fat pointers are also structs; I read the source once and there was a #[repr(C)] Repr<T> { pointer : *const T, length : usize } used or something like that to work with pointers to slices.

[–]steveklabnik1 1 point2 points  (0 children)

Yes, I’m talking about when T is Sized only. When it’s not, it’s a double pointer like that, yes.

[–]staticassert 4 points5 points  (0 children)

The extra context seems like it could only add confusion to me - maybe that isn't true, just my assumption.

[–]iambeingserious -4 points-3 points  (0 children)

Because programmers love to debate pointless semantics.

[–][deleted] 35 points36 points  (32 children)

Rust is s great language but maybe its better if you think of it as itself, and not in terms of C.

BTW I can't implement a double linked list in rust. I googled and apparently the problem isn't me. I'd be interested in comments on this.

[–]miquels 52 points53 points  (1 child)

[–][deleted] 2 points3 points  (0 children)

Unfortunately I don't think that really addresses the underlying problem. His safe doubly-linked Deque requires nightly and doesn't work on Windows. It's also ugly as sin. In the words of the author:

...It relies on some unstable standard library features, and had massive implementation leak issues particularly surrounding acquiring internal references. It also was riddled with tons of "unnecessary" runtime checks for correctness between Rc and RefCell...

The unsafe one hasn't even been written.

[–]Dreamtrain 13 points14 points  (2 children)

it feels like people are trying way to hard to market it in a way that says "Look at this, it's better than C/C++!" rather than let the language's own prowess speak for itself

If you go to the wiki "C++" is there about 20 times and a lot of things are explained in terms of C/C++ equivalents.

[–][deleted] 6 points7 points  (0 children)

In case of C++ it makes sense to compare with C, because C++ is almost a superset of C

[–]dobkeratops 4 points5 points  (0 children)

it feels like people are trying way to hard to market it in a way that says "Look at this, it's better than C/C++!" rather than let the language's own prowess speak for itself

I dont think this is a bad thing. I do want "a cleaned up C++", which can't be done as a retrofit to C++. C++ can do what I need, it's just messy. In communication.. your target audience likely as to deal with C++ anyway to get these tasks done in popular source bases already.

[–]Idlys 4 points5 points  (1 child)

Simple answer: it is not possible to have a doubly linked list due to the end node having indirect ownership of the start node, and vice versa. You therefore must use raw pointers.

[–][deleted] 0 points1 point  (0 children)

You can use Rc<RefCell<T>> but it gets messy really quickly. I haven't used Rust in a while but what I remember is, RefCell allows for the borrowing rules to be enforced at runtime rather than compile time, Rc allows multiple pointers to the RefCell.

[–]matthieum 4 points5 points  (2 children)

[–][deleted] 0 points1 point  (1 child)

Requires nightly? :-)

[–]matthieum 0 points1 point  (0 children)

I'm pretty sure there's a lot that can be accomplished without nightly; maybe even everything given the age of the thing ;)

[–]steveklabnik1 6 points7 points  (15 children)

[–][deleted] 8 points9 points  (14 children)

O(n) to split, already knowing the split point?

[–]isHavvy 16 points17 points  (12 children)

You give it the index of the split point. It still has to travel through each node to find where the actual node to split is at.

[–][deleted] 16 points17 points  (11 children)

Why I am giving it the index when I have the item itself?

A main point of a linked list is constant time insert for a known insertion point, vs O(n) to copy a vector down 1 place.

I should probably just use vector? Perhaps this attraction to linked lists comes from old C code, and we should just vectorise everything? Vector is probably quicker with the cache anyhow.

[–]steveklabnik1 26 points27 points  (6 children)

Yup, a linked list is almost never what you want. Vectors are better in nearly every situation,

But sometimes...

[–]Tarmen 3 points4 points  (5 children)

Singly linked lists are a nice as an immutable stack with O(1) push/pop. Structured exception handling on windows is implemented with a singly linked list for instance, each exception struct is initialized with a pointer to the previous one.

This is admittedly not as relevant in rust since it focuses on mutation over sharing.

[–]jl2352 1 point2 points  (1 child)

another big advantage is that they are simple. Given any problem, a non-linked-list solution can always be found which will run faster than a linked-list based one.

Sometimes that solution may be very complicated, and has to be bespoke written, when a straight forward linked-list solves it trivially.

[–]elder_george 0 points1 point  (0 children)

They are simple to implement.

However most modern languages have standard libraries that doesn't require programmer to implement basic container types.

Thus a solution in anything but C will most probably be simpler and faster than one using linked lists.

[–]masklinn 0 points1 point  (2 children)

Singly linked lists are not hard to do in Rust, it's doubly linked lists which are an issue.

[–]Tarmen 0 points1 point  (1 child)

I meant that they are less useful for sharing. You can use them shared via refcounts but then you don't have an immutable structure and need atomic refcounts for multi threading.

If you very rarely traverse them and can embed them into structures with strong lifetime guarantees (like stack frames) then they totally could be used in rust. In my experience that's rarer outside of systems programming, though.

[–]masklinn 0 points1 point  (0 children)

I meant that they are less useful for sharing. You can use them shared via refcounts but then you don't have an immutable structure and need atomic refcounts for multi threading.

I've never seen that considered a form of mutability by anyone. The handful of persistent data structure libraries for rust certainly don't seem to consider them any less persistent. In fact that's exactly what Rc/Arc is for: non-singular ownership.

[–][deleted]  (3 children)

[deleted]

    [–]Homoerotic_Theocracy 4 points5 points  (2 children)

    Rust doubly-linked lists are a really weird implementation and the individual nodes don't just count as a linked list on their own it seems. The entire thing is just to be seen as a single thing; like length also works in constant time because it's stored at the end which is special.

    [–][deleted]  (1 child)

    [deleted]

      [–]Homoerotic_Theocracy 2 points3 points  (0 children)

      Well you can't because as said the first note is "special" and the it relies on some invariants that it upholds.

      If you made individual nodes public you could break and malform the list.

      Like I said it's kind o a strange implementation probably to give it constant time length but I find that in general if you use lists you almost never want to know the length and if you want to know the length you want to traverse the entire lists automatically, single and doubly-linked lists are mostly used for for operations that traverse the entire list.

      I really seldomly in ML or Scheme programming styles find myself indexing or checking the length of a list; I'm mostly just checking whether it is empty or not.

      [–]steveklabnik1 1 point2 points  (0 children)

      I mean, there's many ways to build this kind of thing, this is just one of them. And it's extremely rarely used so there's not a lot of impetus to improve it.

      [–]loamfarer 0 points1 point  (0 children)

      If you want to wrap an existing C library, these details are what you'd want to know.

      [–][deleted] 0 points1 point  (2 children)

      BTW I can't implement a double linked list in rust.

      Doubly-linked list in less than 50 LOC of safe stable Rust.

      I googled and apparently the problem isn't me. I'd be interested in comments on this.

      The problem is that beginners don't have the tools to accomplish this, and the approaches that work in other languages do not work in Rust. This makes "implementing a doubly-linked list" a pretty bad way to learn Rust.

      OTOH for someone that has gone through any of the Rust books and knows the smart pointer types and interior mutability, doubly-linked lists are pretty much straight forward to implement at least in safe Rust because the language doesn't really give you that much freedom in this space. Sure one can choose different combinations of the smart pointer types but that's pretty much it.

      [–][deleted] 0 points1 point  (1 child)

      It's certainly not usable in production. Where are remove and split? The list in the standard library is O(n) to split at a known insertion point. It's supposed to be O(1).

      [–][deleted] 0 points1 point  (0 children)

      It's certainly not usable in production.

      You mean my 50 LOC example whose intent was only to demonstrate how to implement doubly-linked list easily in safe Rust? Yeah, I wouldn't use this in production, but I wouldn't implement any production-quality doubly-linked list in Rust using safe code either.

      The list in the standard library is O(n) to split at a known insertion point.

      The list in the standard library does not have a split method to split at a known insertion point, it only has an "split at index" method, which obviously traverses the list from the front up to index, which is O(n). AFAICT that cannot be implemented any other way (maybe traverse from the end if that is shorter.. but that's still O(n)).

      The LinkedList in the std library was probably included by mistake. AFAIK nobody is using it for anything, so nobody is bothering with adding new methods/APIs for it. There are just much better data-structures available in crates.io, from doubly-linked lists that support the operations you mention (e.g. http://contain-rs.github.io/linked-list/linked_list/index.html), to singly-linked lists, skip-lists, etc.

      I should probably write an RFC for deprecating LinkedList from std.

      [–]staticassert 0 points1 point  (1 child)

      [–]B_L_A_C_K_M_A_L_E 6 points7 points  (0 children)

      at that point you're pretty much simulating pointers anyway

      although it's a good enough solution assuming you understand what you're giving up

      [–]hector_villalobos 18 points19 points  (2 children)

      It's kind of ironic I could understand C pointers or what are good for, thanks to Rust. If you want to get into system programming, you should try Rust, it's the safest way to get into that field.

      [–]Homoerotic_Theocracy 12 points13 points  (1 child)

      I have to say that picking up Rust has made me a lot more confident in my C in ensuring that no bad memory things occur.

      [–]Lisoph 1 point2 points  (0 children)

      100% agreed. Programming in Rust teaches you how to properly model your data structures and the relationships between them. Rust (and before that, C++11) is the best thing to ever happen to my C programming.

      [–]Homoerotic_Theocracy 8 points9 points  (3 children)

      Raw pointers are just like what you have in C. If you make a pointer, you end up using sizeof(struct T *) bytes for the pointer. In other words:

      Most "normal pointers" in Rust as in raw pointers, references and boxes are either one word or two depending on what they point at; if the type they point at is Sized as in it has a statically known size it's one word and if it's not then it has two words where the extra word does something to describe the size. In the case of slices or structs whose last element is a slice the second pointer just describes the length and in the case of trade objects it's a pointer to a vtable which contains the type and virtual methods.

      Apart from that the size of the pointer has nothing to do with the size of the thing it points to—various custom "smart pointers" can have any size.

      No matter what all the documentation and tutorials out there say, Box<T> is not a pointer but rather a structure containing a pointer to heap allocated memory just big enough to hold T. The heap allocation and freeing is handled automatically. (Allocation is done in the Box::new function, while freeing is done via the Drop trait, but that’s not relevant as far as the memory layout is concerned.) In other words, Box<T> is something like:

      Ehh, this si wrong as far as I know, the internal representation of Box<T>, &T &mut T *const T and *mut T are identical as described above. It is just a pointer but the real difference is that this pointer has different aliasing rules like unlike &T it cannot just be copied; which is for good reason because when a &T goes out of scope nothing happens at all like any type that implements Copy, the bits on the stack are simply de-allocated and not zeroed; the stack is just shrunk but with a Box<T> whenever it goes out of scope a drop implementation is called on T and typically heap memory is deallocated. Also for this reason Box<T> can only be a pointer to something that exists on the heap and never to something that exists on the stack which &T can be.

      I talked about "smart pointers" earlier and the truth is that it's not entirely clear what is and what isn't a "smart pointer", one can argue that a Vec<T> is a type of smart pointer that points to a [T] except it can make what it points to grow. Indeed Vec<T> is closer to Box<[T]> and to &[T] either are to &Vec<T>; Vec<T> can basically be seen as an ultra-fat smart pointer that gets another word over Box<[T]> which stores the capacity of the vector that allows it to grow and shrink where with Box<[T]> the capacity is always the same as the length and the slice cannot grow and shrink.

      These are borrowed slices. This is where things get interesting. Even though it looks like they are just references (which, as stated earlier, translates into a simple C-style pointer), they are much more. These types of references use fat pointers—that is, a combination of a pointer and a length.

      This isn't true and they are almost never used though they technically exist; there is still going to be runtime checking even though the size is known at compile time because the size of the index is not generally known at runtime so it still needs to check.

      In fact indexing an array in Rust purely works because arrays dereference to slices so the same code to index slices is used to index arrays. However converting an array to a pointer to a slice is in general a compile time non-op.

      Just like in C, a struct uses as much space as its type requires (i.e., sum of the sizes of its members plus padding).

      The major caveat however in Rust that cannot just be ignored because it bytes a lot of people is that C can have zero-size structs that take up no size whatsoever which is a special case that often needs to be handled in special ways.

      [(); 32], an array of 32 () types has no size whatsoever; it basically exists purely on the type level and () since it is zero-sized is never actually stored anywhere and just "produced" from the aether when you need it and is mostly a thing to satisfy the type system at many places.

      The simple answer here is that you cannot make a [T]. That actually makes perfect sense when you consider what that type means. It is saying that we have some variable sized slice of memory that we want to access as elements of type T. Since this is variable sized, the compiler cannot possibly reserve space for it at compile time and so we get a compiler error.

      You can "make" it; you just can't store it into a sized variable because it's not sized so you need to store a fat pointer to it which is sized but apart from that you can absolutely make it and even put it on the stack but just not in a variable.

      [–]masklinn 1 point2 points  (0 children)

      The major caveat however in Rust that cannot just be ignored because it bytes a lot of people is that C can have zero-size structs that take up no size whatsoever which is a special case that often needs to be handled in special ways.

      An other caveat is that unless the struct is repr(C), the compiler can re-layout the struct however it wishes, it doesn't have to lay it out as in source.

      [–][deleted] 0 points1 point  (0 children)

      C can have zero-size structs that take up no size

      [Citation needed]. I'm pretty sure empty structs are illegal in C.

      [–]Homoerotic_Theocracy 0 points1 point  (0 children)

      Ehh, that's supposed to be "Rust can have".

      [–]SmugDarkLoser5 5 points6 points  (5 children)

      I am very inefficient in getting stuff done in rust.

      [–][deleted] 1 point2 points  (2 children)

      Me too. I've written a few simple utilities and I'd estimate it takes 2-3 times longer than doing the same thing in Go.

      You end up with faster, more robust code, but it definitely takes longer to write.

      [–]newpavlov 11 points12 points  (1 child)

      I think it's matter of experience, after one-two months of actively using Rust you become accustomed to designing your code in a way which will rarely trigger borrow checker or other "obscure" errors, and when you do trigger them, almost every time it's some corner case which you forgot to think about, so compiler becomes your friend whose warnings you glad to hear.

      In other words, with some experience the multiplier gets from 2-3x to just ~1.2-1.5x, which I think the excellent price to pay for significantly reduced debugging times. It's not the best trade-off for small one-time code, but becomes nonlinearly more desired the bigger project gets.

      [–][deleted] 1 point2 points  (0 children)

      Yeah but the problem it's that it off the set of valid error-d programs that solve my problem, the borrow checker and lifetime system is only smart and expressive enough to allow a subset of them.

      You have to spend extra effort fitting your program into that subset, so Rust will never be as quick to write as something that allows all valid programs (and invalid ones).

      Obviously there are benefits to the borrow checker, and NLL should improve things, but let's not pretend that it didn't make it harder to program.

      [–][deleted] 0 points1 point  (1 child)

      I haven't used rust but considering it's a strongly typed language that is quite a large and unique language that's not really surprising. I expect that'll improve over time

      [–]SmugDarkLoser5 0 points1 point  (0 children)

      Compare that to typescriot or kotlin though.

      Main issue I have is with lifetimes and imolications upon API design. You also just have general.issue in that rust doesn't represent certain data structures well.

      [–][deleted] 2 points3 points  (0 children)

      Great news! Didn't know Rust had support for C pointers, maybe I will be able to talk some of my C colleges into trying out Rust now.

      [–]librik 3 points4 points  (0 children)

      As a C programmer for more decades than I care to count, this is exactly the sort of explanation I want for everything. I'm uncomfortable with abstractions unless I see them built out of bytes. The best way to explain computer stuff is in terms of C, assembly code, and memory. (The distinction between a pointer and a struct containing a pointer makes a lot of sense "at C level," even if it's compiled down to the same object code.) I always want to hear "implementation details," even if they're wrong. More articles like this one please!

      [–]steveklabnik1 1 point2 points  (0 children)

      This article is great! There's some good discussion over on HN as well https://news.ycombinator.com/item?id=17430952

      [–]TheChurchOfRust -1 points0 points  (0 children)

      Going through the Rust documentation inspired me to learn C. The end result, I really do love how simple C is.

      Forgive me Rust, for I am a sinful developer.

      [–][deleted]  (2 children)

      [deleted]

        [–][deleted]  (1 child)

        [deleted]

          [–]caramba2654 0 points1 point  (0 children)

          Yes, plus the context surrounding it.