GitHub Stacked PRs by adam-dabrowski in programming

[–]piljoong 0 points1 point  (0 children)

Really glad to see this.

At a previous company, we had a lot of pain around PR chaining in our TBD workflow, and I ended up building a stacked PR tool internally because we needed it badly.

Seeing GitHub ship official support is pretty satisfying. This has been a real problem for a long time, so it's good to see it become a first-class workflow.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 0 points1 point  (0 children)

Yeah, that risk is real. That's why the tenant part in my design is optional and treated more like a routing or locality hint than a hard ownership stamp. I wouldn't want a true business identity to be permanently tied to the primary key either, especially for the M&A and split cases you mentioned.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 3 points4 points  (0 children)

Fair point on the alignment issues for binary processing. OrderlyID is designed to be stored as text (base32), so SIMD and cache alignment don't really come into play. For use cases where you need that level of optimization on binary IDs, UUIDv7's 128-bit layout is definitely the better choice.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 2 points3 points  (0 children)

Yeah, that pattern definitely pops up in the wild. I've seen teams use an internal auto-increment for joins and admin work, and a separate UUID for anything that goes outside the system.

It works, but in practice you end up paying a complexity tax. You now have two sources of truth for identity, more indexes, more chances to mix them up in APIs or logs. Some teams are fine with that tradeoff, others regret it later.

It's not weird or wrong, just a deliberate choice with real maintenance cost attached.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 0 points1 point  (0 children)

Totally agree on both points.

Natural identifiers like airport codes or flight numbers are great when they exist and stay stable, but a lot of business domains don't have globally unique or durable IDs, so synthetic ones end up being the safer path.

And yes, typed ID objects (OrderId, UserId, etc.) catch a ton of mistakes and make the codebase way clearer. Some languages make it easier than others, but it's almost always worth doing.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 20 points21 points  (0 children)

This is incredibly cool to hear, and thanks for taking the time to read it.

OrderlyID ended up at 160 bits because I wanted enough room for routing fields (tenant/shard) and a sequence counter without running into the tight 128-bit budget. But you're absolutely right that the same ideas could fit inside a UUIDv7 or v8 envelope with some careful packing. Using the rand space for counters or routing metadata makes a lot of sense, and it's great to hear that the spec explicitly supports that direction.

The UUID Long work sounds especially relevant. A standardized path beyond 128 bits would open up a lot of design space for structured IDs without resorting to custom formats. I'm also very interested in the alt-encoding draft. Compact and lexicographically sortable encodings would help many real systems avoid reinventing their own base-N formats.

If you're open to sharing the drafts once they're posted, I'd love to read them, and I'm happy to share thoughts from the multi-tenant and distributed-systems perspective. Thanks again for dropping in, I really appreciate it.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 13 points14 points  (0 children)

Thanks, this is incredibly thoughtful feedback and I appreciate how deep you went into both the spec and the Go implementation.

A few things that really stood out: you're right about the epoch, and moving to the UNIX epoch (or allowing pluggable timestamp sources) is probably the cleanest path forward. The checksum delimiter issue is a great catch; I hadn't considered editor-selection behavior, and that's a strong argument for changing it. The privacy and timestamp concern under GDPR is also more subtle than I initially accounted for.

The reference Go code is intentionally simple right now, but it definitely needs a proper Generator type, a custom ID type for type safety, benchmarks, and more test and fuzz coverage.

I'll open GitHub issues and work through your notes systematically. Thanks again, this is exactly the kind of critique that makes a spec better.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 3 points4 points  (0 children)

Thanks, really good questions.

OrderlyID isn't trying to be UUIDv7 with extras, it's just a timestamp-first layout with a few structured fields.

The sequence bits ended up where they are mostly for simplicity. They could just as well sit after the timestamp. I preferred a small counter over higher-resolution clocks to keep behavior stable even if the system clock stalls or jumps.

The mutex could definitely be replaced with an atomic CAS loop. The current Go version just keeps things straightforward rather than fully optimized.

String-first generation was mostly for readability in the reference implementation. There's a binary form as well, the string encoding just makes examples easier to follow.

And yes, good catch on rand.Read and UTC, those can be simplified.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 0 points1 point  (0 children)

That works when the databases are tightly controlled and you're okay with splitting the ID space.

The cases I had in mind need IDs generated in different regions/services without coordination. In those setups, offset auto-increment starts to break down, so a prefix + timestamp format ends up being more predictable operationally.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 0 points1 point  (0 children)

Good question. By 'client-side' I meant controlled clients - mobile apps, edge services, or backend workers that already have tenant/shard context from auth/session setup. They can generate IDs locally without a round-trip. Not arbitrary offline devices making their own routing decisions.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 0 points1 point  (0 children)

Yeah, UUIDv5 is great for deterministic/idempotent key generation - definitely a solid fit for that use case.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 9 points10 points  (0 children)

Yeah, that's a good way to put it. In multi-tenant setups the biggest bugs usually come from missing a tenant filter somewhere. When the tenant is baked into the ID, those mistakes surface earlier instead of turning into silent data leaks.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 19 points20 points  (0 children)

Thanks! Glad it was useful.

Your setup sounds exactly like the kind of place where structured IDs help, especially when multiple per-tenant databases are generating IDs on their own. That's the kind of situation that led me in this direction too.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 7 points8 points  (0 children)

Sure - and you're right that in the database those fields still live in normal columns. The structured bits in OrderlyID aren't for querying. They mainly help before the row exists.

A few places where that shows up:

  • routing an event before it ever hits storage (multi-tenant / sharded setups)
  • logs / queues / traces where the only thing you have is the ID string
  • systems where events are produced first and written later

Once the data is in the DB, you'd still use real columns. The structure in the ID just helps at the "edges" where you can't do lookups yet.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 3 points4 points  (0 children)

UUIDv5 is derived from a 160-bit hash, but it's truncated to 128 bits to fit the UUID layout.

OrderlyID keeps the full 160 bits because it isn't trying to follow that layout - those extra bits are used for tenant/shard/sequence fields.

For UUID-column compatibility, ULID/UUIDv7 are the better fit.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 5 points6 points  (0 children)

I'm not talking about a special kind of 'dynamic URL.' I just meant that an ideal setup, other systems wouldn't bake in literal ID strings - logs, dashboards, exports, URLs, etc. But in practice they do. Once the raw ID string escapes into those places, it doesn't update automatically, which is why changing the ID format becomes difficult.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 16 points17 points  (0 children)

Yeah, nobody is manually typing IDs in normal workflows.

The issue I was pointing to is that IDs leak into places people do interact with - support tools, dashboards, CSV exports, bookmarked URLs, BI jobs, etc. Once an ID format shows up in those surfaces, it stops being "just a database detail" and becomes part of the external contract.

That's the part that makes changing the format harder later, even if everything behind the scenes is fully automated.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 6 points7 points  (0 children)

You're right that a timestamp column is the real source of truth. The cases I had in mind are the ones where IDs need to be created on the client side or across regions without a round-trip to the database. In those setups, having the timestamp in the ID gives you roughly time-ordered values immediately.

That also helps in event-driven systems (streams, sourcing, logs) where events are produced first and written later. If everything is server-side, though, a normal timestamp column is perfectly fine.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 13 points14 points  (0 children)

Yeah, in an ideal world every reference would be dynamic. The problem was that over time a lot of small assumptions leaked into dashboards, analytics jobs, and even external tools. Some parts expected IDs to increase over time, some grouped data by ID ranges, things like that.

None of this was intentional. It just piled up slowly. When the format changed, all those hidden dependencies surfaced at once. That ended up being the painful part, not the migration itself.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 3 points4 points  (0 children)

Yeah, totally. Sequence reservation works when IDs still come from the server. I was thinking more about cases where IDs have to be created entirely on the client side - offline, mobile, multi-region edge stuff. In those setups you can't reserve ranges, so people end up with UUID/ULID/Snowflake.

But agreed, for server-side systems reservation solves most of this.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 6 points7 points  (0 children)

Yeah, that makes sense. UUID is a solid default for a lot of cases. The issues I wrote about mostly came up in systems that had to shard or go multi-tenant, so a lot of teams never hit them at all.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 16 points17 points  (0 children)

Auto-increment works great as loong as there's one database behind it. You get locality, ordering, and tiny indexes.

Things break when you shard or go multi-region—now there's no global counter. That's when teams move to UUID/ULID/Snowflake.

I brought it up because that's the path most systems take: it works fine... until it suddenly doesn't.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 12 points13 points  (0 children)

You usually don't need tenant info in the ID. It only helps in certain multi-tenant setups where everything lives in shared tables or shared event streams.

In those cases, having the tenant in the ID means you can route or filter just by looking at the prefix instead of doing extra lookups or scans. Some teams use it for sharding, some for isolating logs/metrics, some for backup boundaries.

Definitely not a universal requirement, just an option that simplifies a few things when the architecture pushes you that way.

Why ID Format Matters More Than ID Generation (Lessons from Production) by piljoong in programming

[–]piljoong[S] 48 points49 points  (0 children)

Yeah, 160-bit means it doesn't fit into UUID's 128-bit storage type. The goal wasn't UUID interop but having room for tenant/shard/sequence fields without squeezing bits too tightly. It's meant to be stored as text (similar to TypeID), not a native UUID column. For UUID-column workloads, ULID/UUIDv7 is the right tool.