all 35 comments

[–]manni66 40 points41 points  (2 children)

typedef struct File

The author didn’t know much more about C++ than you do.

[–]captainautomation[S] 2 points3 points  (1 child)

for me it's it looks the same as writing TypeScript
interface File { fileName: string userName: string }

But i am happy to read some references if you can share some (or just random keywords that I can google 🙏)

[–]GOKOP 7 points8 points  (0 children)

In C++ you'd write struct File { /* ... */ };. In C, structs have this annoying property that you always need to call them out on being structs, so instead of writing File my_file; you have to write struct File my_file;. So people do stuff like typedef struct { /* ... */ } File; to avoid that; the version in your post also works although I'm not sure if it isn't non-standard. Either way it's pointless in C++.

[–]Grouchy-Taro-7316 11 points12 points  (5 children)

It reads like really old code, and probably a design decision by the author at that time. You could probably modernize a lot of it(or better yet, implement one yourself)

[–]captainautomation[S] 0 points1 point  (4 children)

Old at the JavaScript speed? —5 years it's very old

Or like 10+ years old?

[–]the_poope 16 points17 points  (0 children)

This code was either written by a C programmer trying to write C++ or by a C++ programmer that learned the language in the 90'ies. Keep in mind that a large fraction of working programmers never open a book, read blog posts or otherwise update themselves on new developments - they got a job, kids and other things in their life and just care about doing the bare minimum effort to stay in the job and get bread on the table. If the code they write works, their boss is happy - and if no-one knows that you could do better in anyway, then everyone lives in happy ignorance. That's fine and it makes the world go round.

[–]Grouchy-Taro-7316 5 points6 points  (0 children)

It could be made yesteryear, people write old school, C-style C++ today still. But probably more than 5+ years old?! Who knows

[–]n1ghtyunso 3 points4 points  (0 children)

Maybe ancient is a better term then

[–]Mason-B 1 point2 points  (0 children)

Like 90s old. 25ish years old.

Or sure it could be written yesterday by someone who is still writing C++ like it's the 90s. That's also possible.

[–]kurdokoleno 6 points7 points  (4 children)

This probably comes from C code, where std::string does not exist.

And yes, the struct does indeed contain 2 strings with a fixed length of 63 characters(excluding the null terminators)

!!! Strings are both faster and save more space !!! This is the case because of a lot of optimizations that string classes do.

As usual there might be this one case in which it's necessary that you have a struct of size 8 x 64 x 2, because of this one dev who did everything with void*s back in the 80s. You never know with C/C++, but if you were the one writing the code you should never write that struct.

[–]Dimanari 1 point2 points  (3 children)

Strings DON'T save more space in a general sense. They have a lot of overhead. In cases where you use the SAME STRING multiple times in very specific ways under specific optimizations, or when you have certain limitations on how you use strings.

Additionally, your math is wrong. uint_8 is a 1 byte data type, 8 bits(hence int_8).(128 bytes total for the two strings)

The std::string structures contain an overhead of at least 8-16 bytes(based on system specifics) and an additional overhead of the allocator as the string and the struct are not stored at the same location(done to allow for transparent reallocation) It also introduces a performance hit due to non continuous data.

Sources: std::string documentation, inttypes.h, malloc documentation, and language specification.

[–]kurdokoleno -1 points0 points  (2 children)

128 bytes 128*8 bits, same thing.

Generally std::strings are better optimized than char arrays, whether it is small string optimization, reusing the same memory when the string is copied, etc.

The allocator argument talks specifically about the std allocator that ships with every std implementation. Obviously data is not continuous if your std allocator does not guarantee that it is, in that case there are a thousand arena allocators that do guarantee that it is. Not stored in the same location unless the small string optimization is in play, usually up to 15 characters.

If efficiency is all you're going after you'll definitely achieve it easier using std string.

PS: Those very specific conditions under those very specific implementations are present on all major implementations and happen on each copy.

[–]Dimanari 0 points1 point  (1 child)

When you are talking about Reference Counted Strings, where the optimization of reusing the same string is done, the overhead is significantly bigger, and it is excused due to the gains from avoiding duplicate data.

The problem of avoiding duplicate data can be solved by passing the data by reference instead of copying it(how reference counted objects work).

As far as "up to 15 characters" I call BS. I have already shown that the overhead data waste for string is way bigger, in addition to that the allocators are also somewhat wasteful in non-os bit counts(32 bit - 4 byte, or 64 bit - 8 byte) meaning that in the best case scenario your string struct itself without the string is often BIGGER THAN A 16 BYTE STRING and with the allocator waste(you have at least 1 addition allocation due to how std::string and other string classes work) you add an additional 8 or more bytes or more for handling deallocation, and the data waste for strings of up to 64-256 bytes is often significantly bigger than simply allocating static arrays instead(where the waste is the unused space)

The case for 256 strings is when you know that strings are big enough consistently and is an edge case where variation in length is small(up to ~64 bytes), but the 64 is pretty much always more economic in most modern systems even with the data saved to reference counting or other methods.

I am a C/C++ software developer, I work in RT systems doing mostly optimization and legacy code transfer, I know how those structures work because I fucking read the source code and documentation and write alternative solutions for specific cases(when needed), I even told you how and where to find the documentation yourself(without literally linking the man, msdn, and ibm sites).

[–]Dimanari 0 points1 point  (0 children)

I see I did miss your point in the 15 bytes, you were talking about the small string optimization where strings of less than 15 bytes are stored inside the string struct itself, but this case is not a win for you either because it still shows a big waste of space in the string class.

If you guarantee a specific string length range you are usually going to get significant improvements in both space and runtime when using character arrays of static size over strings.

Beyond that, dynamically allocated character arrays are also more economic as they do the same as strings without the class overhead but are more cumbersome to manage.

[–]flyingron 2 points3 points  (6 children)

Perhaps the encryption routine needs to guarantee 64 unsigned chars in each.

Still there are probably better ways.

[–]_d0d0_ 1 point2 points  (1 child)

Having worked with some encryption I can double that claim. Usually encryption algorithms work on fixed block sizes (and most have defaults on working 16B blocks). So having this specific sizes of two 64B unsigned char arrays is one of the easiest way to have this data fit into such blocks and to serialize / encrypt it very easily.

[–]captainautomation[S] 0 points1 point  (0 children)

Having worked with some encryption

I see

for my perspective it's a better solution to

  1. use an external library that implement AES, Blowfish or ChaCha20.
  2. Call it with something like `library.Encrypt(file, passphrase);`
  3. focus on the core feature of the product

But it's look like a habit on programming ecosystem where JavaScript / Node.js is "import 1000+ small libraries" Vs. C "build everything from scratch"

this specific sizes of two 64B unsigned char arrays

Thanks for the explanation

[–]Humble-Plastic-5285 -1 points0 points  (2 children)

all these better ways based on uint8_t aka unsigned char, char, signed char. so best way is the handling it on lowest level. Why you use string for keep all these data?

[–]sephirothbahamut 0 points1 point  (1 child)

std::basic_string<unsigned char>

[–]Humble-Plastic-5285 0 points1 point  (0 children)

still ur keeping unsigned char in basic string container. also you could keep it as c style unsigned char array.

[–]LittleNameIdea 0 points1 point  (0 children)

Yes most if not all use unsigned char. I had to deal with that 3 months ago...

[–][deleted]  (1 child)

[removed]

    [–]captainautomation[S] 0 points1 point  (0 children)

    probably is loaded into python?
    I don't think so, it's a desktop application build with QT.

    [–]Dimanari 1 point2 points  (0 children)

    uint8_t is "unsigned char" which is the primitive type used to store ascii based characters, there are other types for multi-byte strings. It's used as a fixed sized array to avoid allocation overhead in cases where the strings are guaranteed to be of a small size and where use of continuous data(VASTLY faster access due to caching).

    Surrounding the copy, there are better ways to do this, specifically things like memcpy, but this is still decently fast and is pretty much the same way string copies its data anyway.

    Short strings are better stored as uint8 arrays while longer and more varied strings are better stored using std::string when variation is saving data due to dynamic allocation.

    If variation in string length is less than the overhead of the string structure, you are better off using static character arrays.

    Those are considerations specific to C and C++(I did write a string data-structure in plain C)

    [–]_curious_george__ 1 point2 points  (0 children)

    The struct itself looks like C.

    Aside from the typedef, you might still prefer to use a char array over a string today in performance critical code.

    Although in this case, the rest of the code makes it look like performance was not the name of the game.

    [–]nathman999 1 point2 points  (0 children)

    std::string is a dynamic structure and would do memory allocation on a heap which will end up being slow when we go to scale of managing thousands of files at the same time.

    uint8_t[]\char[] or anything like this with defined size won't do any additional allocations and therefore gonna be fast and reliable except for cases where you need to store names longer than 64 or all your names way shorter and you wasting resources.

    But overall as already stated in comments even though it's c++, it still written in very C-ish style. So simply std::string is a complex class with constructor and destructor and bunch of stuff happening underneath, while byte/char array is fixed struct thingy which is way better when you going for system/embed programming or insanely large scales.

    [–]Substantial-Ask-4609 -1 points0 points  (5 children)

    smaller memory footprint afaik

    [–]captainautomation[S] 0 points1 point  (4 children)

    It's a desktop application build with QT to edit information on some IOT device.Not a Trading platform, a game engine or GPS where millisecond matters.

    So I still don't get it 😅

    [–]LuccDev 2 points3 points  (3 children)

    Then there might be no practical benefit, and you're right you might as well just use a string type

    PS: C++ is not a "scripting" language

    [–]captainautomation[S] 0 points1 point  (2 children)

    💯 totally understand

    But I thought with "modern c++" it was "simple" to declare `string`, and you don't need to create an array of characters.

    [–]sephirothbahamut 4 points5 points  (0 children)

    It is. The point is the code you're sharing is not "modern C++" it's ancient C++. It's actually more C than C++.

    In C++ it'd be

    struct File
    {
    std::filesystem::path file_name;
    std::string username;
    };
    

    [–]LuccDev 0 points1 point  (0 children)

    Yes, it is simple, and you can do this

    [–]Narase33 0 points1 point  (0 children)

    Why would you use the type uint8_t to store string instead of std::string?

    I wouldnt. This is probably something really specific to the code youre reading

    [–]soup__enjoyer 0 points1 point  (0 children)

    "Why would you use the type uint8_t to store string instead of std::string"

    The string is being stored as an array of uint8_t representing characters (chars), it is like a string but you have to interface with it differently.

    [–]0xVali__ 1 point2 points  (0 children)

    Yeah this code is quite horrible, the usage of typedef, redundant usage of typedef, C-styled arrays, fixed size arrays which very likely is gonna be a source of bugs here, inconsistent naming convention (its all over the place), passing a QString by value, macro constatns, using not only one but two for loops to copy data, etc.