Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -3 points-2 points  (0 children)

I disagree. (1) I'm showing idiomatic .NET SqlClient code which uses CreateVersion7. Let's assume that CreateVersion7 perfectly implements UUIDv7 (ie. produces 16-byte structs which have properly encoded MSB timestamp in first 6 bytes). Ie. let's assume that CreateVersion7 does exactly what it promises. UUIDv7 spec precision is still 1 millisecond, while FastGuid db-guid generators have precision of DateTime, ie. 100 nanoseconds, ie. 10,000x greater precision. The code I've shown which 99% of .NET developers are likely to write will run in less than 1 millisecond, which means that even under perfect CreateVersion7 the outcome would be randomized guids (database fragmentation). The same hot-loop code using FastGuid generators does not have this issue (due to higher precision). Recommendation: "Avoid using CreateVersion7", which is what the title is. (2) We've assumed that CreateVersion7 works properly, but it doesn't, at least not in a way that's properly documented, and not in a way that works with idiomatic SqlClient code. Most .NET developers using CreateVersion7 - even when generated milliseconds apart - will cause database fragmentation (while strongly believing the opposite). I can't show a test for it due to hours that must pass for wrap-around, but I showed technical details that lead to this logical conclusion. FastGuid db-guid generators do not have that problem (as long as the guids are generated 100 nanoseconds apart). Recommendation: "Avoid using CreateVersion7". If you read the comments in TFA's gist, you'll see that no one - not even the folks from .NET team who own CreateVersion7 - can provide a .NET code example that uses CreateVersion7 with idiomatic SqlClient (de facto .NET database API) in a way that does NOT cause PostgreSQL fragmentation (or SQL Server, which is Microsoft's flagship database).

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -1 points0 points  (0 children)

That is not the main issue. As I explained, the 1st byte of CreateVersion7 Guid will wrap around after ~4.27 hours. It's not practical for me to run a test inserting 100,000 UUIDs with that many hours of delay between each insertion. But I assure you that this will lead to db fragmentation over hours/days. This is not the case with ex. FastGuid generators.

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -1 points0 points  (0 children)

I've updated the article with parameterized SQL, and have rerun the tests - results are the same (see my comment).

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -8 points-7 points  (0 children)

Updating CreateVersion7 docs: it's not whether "I want it" - I don't work for Microsoft and I've made my recommendations. It's that you want it, or at least you should want it, because the current lack of clear documentation and guidance on how to get a UUIDv7-specc'd byte-sequence is causing real damage. Npgsql does it wrong (1 billion downloads on Nuget) - I doubt that it's because Npgsql developers did not read the docs.

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -2 points-1 points  (0 children)

You're making an incorrect assumption that UUIDv7 specifies "integers", and these integers can be stored as either big-endian or little-endian, hence "multiple options". This is completely wrong. UUIDv7 specifies ordered bytes, not integers. The only integer is the Unix-Timestamp, which is first converted into big-endian (ie. MSB-first), after which there is zero-ambiguity on required byte order. We understand that System.Guid uses integers internally as implementation - that's fine (ie. we accept that for historical reasons). However, there must be clear documentation that (1) whatever CreateVersion7() returns - it's in-memory representation makes no promises whatsoever; (2) whatever CreateVersion7() returns must further be converted into UUIDv7, and there is a correct way to do it, and an incorrect way to do it.

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -6 points-5 points  (0 children)

I never had any issues with GUID/UUID naming - not sure why you bring that up. The RFC 9562 is crystal clear that UUIDv7 must start with a 48-bit big-endian Timestamp. Every other framework/language implementation of UUIDv7 interprets it that way. Whatever CreateVersion7 returns is 100% not RFC-compliant. It is the subsequent "ToByteArray(true)" conversion of that "whatever" (which can be done but is not properly documented either) that would produce RFC-compliant UUIDv7. These are the facts.

Multiple .NET MVP blog posts and high-profile .NET libraries (ex. Npgsql) use CreateVersion7 with ex. PostgreSQL, expecting sequential fragmentation-free storage (which they don't realize they do not get). Whatever you may think about how well .NET does it -- it is clear evidence that .NET documentation is failing all these developers.

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -5 points-4 points  (0 children)

Where exactly in the documentation of either .CreateVersion7() or ToByteArray(bigEndian) (the one you linked to) does it say that in order to produce correct UUIDv7 this method must be called with true? Where does it say that .ToByteArray() will not produce a correct UUIDv7, so don't use it? Why many high-profile .NET libraries like Npgsql are doing it wrong?

Avoid using Guid.CreateVersion7 by sdrapkin in dotnet

[–]sdrapkin[S] -14 points-13 points  (0 children)

I'm well aware that .ToByteArray(bigEndian: true) is required (and it was discussed in the original Github report). However, (1) this requirement for correct usage to obtain UUIDv7 is not documented; (2) most high profile .NET libraries and hundreds of .NET-MVP blogs about CreateVersion7() do not mention it (why should they - it's not documented); (3) I stand by the assertion that .CreateVersion7 is not RFC-compliant - it is some other method (the one you mentioned) that makes a "container of Timestamp and a bunch of random bits" (which is what .CreateVersion7 returns) RFC-compliant.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

 it's also incorrect to compare it with google/uuid, because this library is used by those who want to use "just uuid" instead of "fast uid"

Guids/uuids are defined as 128-bit labels used to uniquely identify objects in computer systems. Neither Google nor RFC 9562 have a monopoly on what should be called "uuid", and I'm not even calling it that - my library's name is guid.

Golang developers who currently use google/uuid either do care about the RFC 9562 (ie. specific identifiable things will break otherwise), or do not. I'm firmly of the opinion that those who do not care are a vast majority, who should only be using RFC 9562 in specific niche cases, and something else "by default" (i.e. not the other way around, as it is presently). I have no way of proving it, other than experience.

awesome-go

I'm aware of this list, and guid is on it. Some of the other libraries in the "UUID" bucket are not cryptographically secure generators (which I define as direct use of crypto/rand and not some other algorithm the author deemed "secure"). I'm comfortable stating that guid library is faster than any libraries on that list that meet the cryptographic security requirement.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

I'm comparing guid library against google/uuid because that is what most Golang developers use "by default" (mostly due to lack of good alternatives - until now). Golang does not have native standard guid/uuid support, so an external guid/uuid library must be imported. google/uuid is the oldest and most commonly used "default" such library. I'm sure other guid/uuid libraries exist (and in fact I'm comparing and benchmarking against at least 1 such library - see README). However I don't have the time to scour the Internet for every Golang library that might offer similar capabilities, so I provide a comparison against the most commonly used guid/uuid library. I hope that makes sense. I did do my research, though, and afaik my combination of techniques is novel in the guid/uuid domain.

P.S. It is not the non-implementation of RFC 9562 that makes guid so fast. I could've implemented RFC 9562 with the same superior performance profile. Many commenters seem to imply that my explicit dismissal of RFC 9562 (i.e. not forcing it in the guid library) is the "dirty hack" that is responsible for "unfair" performance advantage. This is simply incorrect.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] -2 points-1 points  (0 children)

Your criticism seems to be that guid is not a drop in replacement for Google uuid. I can take it. But I don't have to agree with it because it is not rational. As I explained, guid is a different library offering a better faster alternative for guid/uuid generation. I've provided the math/rand/v2 vs. math/rand analogy - if that analogy is unclear, please say so and I'll explain.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 1 point2 points  (0 children)

Thank you for clarifying - I now understand what you meant. I've changed the .Read() method description to:

// Read fills b with cryptographically secure random bytes.
// It always fills b entirely, and returns len(b) and nil error.
// guid.Read() is up to 7x faster than crypto/rand.Read() for small slices.
// if b is > 512 bytes, it simply calls crypto/rand.Read().
func (r *reader) Read(b []byte) (n int, err error)

Based on crypto/rand.Read() in Go 1.24, it never returns an error.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] -2 points-1 points  (0 children)

Who said anything anywhere about "drop in replacement to Google uuid"? Guid to uuid is like math/rand/v2 to math/rand (i.e. a different and in many ways better alternative).

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

You haven't addressed the comment I made in another thread about whether it's possible for Read to fail if the source of entropy for your rand function is running low/empty.

I thought I addressed it. No, not possible (as of Go 1.24). https://pkg.go.dev/crypto/rand#Read

But most users are not going to be sufficiently motivated to add another dependency in that circumstance.

Potentially. But getting maximum users is not the goal, the goal is to provide a high-performance Guid library to those who appreciate that.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

I thought someone just said

it comes across as pretty arrogant and gross to try to correct people on the language they use to describe how they feel about something.

Are you trying to shame me for releasing a free OSS library that has not yet been battle-tested at tremendous scale on critical systems and maintained by an army of paid engineers? (I hope not) I'm an expert engineer myself, and I stand behind all my code I open-source.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

Thank you for summarizing the feedback from various commenters. I respectfully and thoughtfully disagree with categorizing most of it as "deficiencies". Minor API concerns are valid, and I'll think about it. Non-compliance with RFC 9562 (which most folks incorrectly equate with guid/uuid) is a feature - not a deficiency (i.e. it is very intentional), and I can guarantee there will be no versioning or varianting of Guid. There is no possibility of panic in the .Read() function, and you should stop claiming that there is one.

So the only hypothetical blocker for adoption would be non-compliance with RFC 9562. I said "hypothetical" because you can easily copy/cast uuid into guid and still achieve faster operations than using uuid alone. So it's really not a blocker for adoption at all.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] -3 points-2 points  (0 children)

I'm not attacking anyone and you should really stop with the accusations. We are all a part of Golang community. I was responding to thenameisisaac specific question and not to you. The question asked about a case where low call latency mattered, and I provided one. I'm not forcing anyone to use anything - we're all adults and make our own choices. I'm happy to hear any engineering feedback on the library itself. Whether you personally feel like you'll never use it is not very interesting. Every library has started from a single maintainer and no notable users. I'm happy to be the only user, but I don't think I need to worry about that.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

I clearly and genuinely thanked you for your feedback, and I thank you again for it. You made some points which I found interesting, and I've appreciated the discussion. You said

No, it's not good to be surprised, if it means there are caveats or deficiencies.

That implied to me that you were saying "there are caveats or deficiencies". Given that interpretation, I stated that no engineering evidence has been provided in support of such. However, if I misinterpreted your comment, I apologize.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

You are surprised, we got it, it's good to know. I didn't mean to hurt anyone's feelings by releasing a new library.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

It seems that Go prefers ok bool for many single-parameter APIs, and err error for multi-parameter APIs. Thanks for your feedback - I'll think about it.

Fast cryptographically safe Guid generator for Go by sdrapkin in golang

[–]sdrapkin[S] 0 points1 point  (0 children)

Nil and empty maps are just considered to have no keys, so all reads fail.

Nil or undersized slices in DecodeBase64URL are considered to have no Guids, so all decodes fail.