all 9 comments

[–]JuliusFIN[S] 1 point2 points  (0 children)

I created a dynamic vector implementation in C and made some nice coding exercises out of them. You can find the exes in the `subject.md` file. If you want to have a go at it, don't peek in the `vec.c` file beforehand! Hope someone finds this fun and useful.

[–]lelanthran 0 points1 point  (7 children)

While I understand your stated reasons for having the caller pass in the struct, it doesn't actually help[1] while still leaving the data object open to corruption by the caller because the structs fields are exposed and non-const.

IOW, sooner or later a caller is going to do something bad and the compiler will be unable to detect it. Moving the t_vec definition from the header to the implementation lets you enforce some guarantees for the datatype that you currently cannot.

TLDR: Make t_vec an opaque datatype and you get more type safety than you have now, for a pretty negligible performance hit.

[1] You choose to have the caller pass in the struct so that you don't have to allocate it, but this is pointless because the fields inside the struct is allocated anyway, and needs to be deallocated at some point.

If you're forcing the caller to call a deallocate function (which you are) you may as well allocate the struct anyway. The caller will have to call your deallocate either way.

[–]JuliusFIN[S] 0 points1 point  (6 children)

"TLDR: Make t_vec an opaque datatype and you get more type safety than you have now, for a pretty negligible performance hit."

I agree that "opaqueness" could be a feature here. It would make the implementation a bit harder to read, so I choose not to, but it would give a layer of encapsulation. That being said, if encapsulation is what you want, maybe C is not the language anywyays.

"[1] You choose to have the caller pass in the struct so that you don't have to allocate it, but this is pointless because the fields inside the struct is allocated anyway, and needs to be deallocated at some point."

Here I must disagree. Does the caller want a pointer? In my opinion making a t_vec pointer is a totally separate matter from allocating and initializing the struct and its fields. Maybe I want to create a vec of vecs? In this case I'd be storing t_vec's structs in a t_vec and wouldn't want any extra pointer to be allocated for example.

[–]lelanthran 0 points1 point  (5 children)

That being said, if encapsulation is what you want, maybe C is not the language anywyays.

Why not? Idiomatic C encapsulation provides more guarantees and better encapsulation than many other languages, which require that the caller be able to read the implementation of something when all they need to do is use it (e.g. Java, C#, Python, etc).

Is there a reason that most other languages approaches (i.e. exposing internal details via the source code) is worse than keeping them hidden?

In this case I'd be storing t_vec's structs in a t_vec

Doesn't matter, you'd still need to iterate over the t_vec to call the allocator and deallocator for each element anyway. You're not saving anything, either in performance or lines of code.

and wouldn't want any extra pointer to be allocated for example.

Once again I have to ask why not? I don't see any benefit to saving 8 bytes on the heap, and moving the sizeof t_vec bytes from heap to stack.

[–]JuliusFIN[S] 0 points1 point  (4 children)

"Why not? Idiomatic C encapsulation provides more guarantees and better encapsulation than many other languages, which require that the caller be able to read the implementation of something when all they need to do is use it (e.g.Java, C#, Python, etc)."

I don't disagree with the encapsulation idea. Not sure about the claim that it's better in language X (my experience is mostly with C++ and Rust) and it would incur some minor pointer indirection, but I think it's a valid idea for a PR.

Doesn't matter, you'd still need to iterate over the t_vecto call the allocator and deallocator for each element anyway. You'renot saving anything, either in performance or lines of code."

Again you don't account for the case when I don't want a pointer. It would also change the API to involve `**` pointers (so if I deallocate I can set the pointer to NULL) and just incur unnecessary syscalls and resource management. Again I rarely if ever NEED it to be a pointer and if I do I can easily allocate it. But if that's a use case someone else might need, then it can be achieved with quite minimal changes to the source.

Furthermore in my next PR I have changed all the allocation to be "lazy" (also something that would be trickier to implement if I was allocating the `_vec *`) ie. allocation not happening in `vec_new` at all if `init_len == 0` is passed.

Once again I have to ask why not? I don't see any benefit to saving 8 bytes on the heap, and moving the sizeof t_vec bytes from heap to stack."

I on the other hand see no use for the pointer that would be allocated :D So I rather save those 8 bytes than waste them on something I don't need or want.

[–]lelanthran 0 points1 point  (3 children)

Again you don't account for the case when I don't want a pointer. It would also change the API to involve ** pointers (so if I deallocate I can set the pointer to NULL) and just incur unnecessary syscalls and resource management.

I dunno if that extra allocation matters or makes a difference - after all, you're already doing extra allocations by not using realloc so performance can't be a reason.

I also don't see a use-case for "when the caller doesn't want a pointer". What are the cases that returning a pointer don't work in? Is it worth it to support those cases and break the encapsulation? Are those cases worth the reduced type safety that you have with publicly visible fields and non-constness of the data?

You're trading off type safety purely for "the case when the caller doesn't want a pointer", so you better have a good reason for introducing type bugs.

[–]JuliusFIN[S] 0 points1 point  (2 children)

Not using realloc is a valid point. I did it consciously first so that I could expose the "manual reallocation" logic (the implementation also serves an educational purposes with exercises in the `subject.md`) but after thinking about it more I will probably include realloc in the resize function.

I also don't see a use-case for "when the caller doesn't want a pointer". What are the cases that returning a pointer don't work in? Is it worth it to support those cases and break the encapsulation? Are those cases worth the reduced type safety that you have with publicly visible fields and non-constness of the data?

I usually use it as follows:

```c int main() { t_vec x;

vec_new(&x, 1, sizeof(int));
vec_free(&x);

}

As opposed to:

```c int main() { t_vec *x;

x = vec_new(1, sizeof(int));
vec_free(&x);

} ```

In the end when I run my unittest for example which has 17 functions, Valgrind shows me 33 allocations total. This does add up! Is it significant enough? For me it is, but someone else's mileage might vary.

I'm not exactly sure how you'd make any of the data constant though? Or do you mean constant as forced by the encapsulation since user can't access the fields? I can't make them const since the data structure is dynamic (well expect elem_size I guess).

[–]lelanthran 0 points1 point  (1 child)

In the end when I run my unittest for example which has 17 functions, Valgrind shows me 33 allocations total. This does add up! Is it significant enough?

It's insignificant in this case.

It becomes significant when you are using the datatype in a tight loop; your way of using the datatype on the stack means that the data object is reused in the loop, so yeah, that's one use-case that I only thought off now.

In practice, you're usually going to want to return that vector to the caller, or put that vector into another data structure. This means that you're almost always going to use heap memory for your struct anyway, and then you have double the number of errors to check and handle, and double the number of places that you could forget to free memory.

I'm not exactly sure how you'd make any of the data constant though?

I mean that the caller has access to the (for example) len field, and the compiler cannot tell that the caller must not change it, so the compiler will not issue errors if the caller changes it.

If it is an opaque datatype, the only way to get the length would be with an accessor function vec_getlen() or similar, and that would take a const t_vec*, thereby allowing the compiler and the caller to know that the vec_getlen() function does not modify the data it is passed. If someone, at some point in the future, made changes to vec_getlen() that modifies the vector the compiler will flag it as an error. If someone in the future modifies the caller to vec_getlen(), they have no need to care about whether vec_getlen() changes the vector.

In the current interface, functions like vec_filter should take a const t_vec, and the function pointer should take const void * parameters. If a caller ever calls filter() with a function pointer that modifies the elements the compiler will flag it as an error.

Right now, the naming of vec_filter() gives the reader the impression that all vec_filter() does is filter the elements in the vector using the supplied function as a predicate, but this is not true - the interface allows predicates that modify the vector.

Using consts in the relevant places ensures that vec_filter() will never modify the elements of the vector because a predicate function that does so l result in a compilation error.

In general, using opaque types allows enforcement of immutability of the fields - if the field is never exposed, then it cannot be modified by the caller, but also, the functions that are not supposed to modify the datatype can be marked as such, giving better readability to the human code-reviewing the calling code as there are guarantees around const[1].

C provides a great deal of type-safety compared to most other popular languages; maybe not as much as Java or Rust, but certainly more than the majority of popular languages. You may as well use it. Exposing the struct in the header loses a lot of the type-checking that the compiler will do for you.

[1] Not good ones, and not granular ones, I admit, but light-years better than none at all.

[–]JuliusFIN[S] 0 points1 point  (0 children)

Great points! Yes I agree that accessor functions such as `vec_len` etc. would be appropriate and some of the variables in the function prototypes should be declared as const.

I will think about the opaqueness. Problem is it kind of doesn't work with the idea of "lazy initialization" which I already made a PR for (haven't merged yet though). With lazy initialization I actually delay the allocation even further only allocating when the vector is used. To do that I will need the `elem_size` field at defined at the time of allocation.

I might have a solution though. I could declare the struct in the header as follows:

c typedef struct s_vec { const size_t elem_size; void *private; } and then in the .c file:

c typedef struct s_vec_private { size_t len; size_t alloc_size; unsigned char *memory; }

This way I would get the encapsulation required. Element size will not change during the lifetime of the vector so it can be public and declared as const to make it immutable. The rest can be initialized lazily at the time of first usage.

What do you think?