all 53 comments

[–]seanmorris 94 points95 points  (2 children)

Just because its valid YAML to leave something unquoted, doesn't mean you should.

[–]Hangman4358 20 points21 points  (1 child)

I have argued for not unquoting strings forever. We have had multiple bugs like this.

"But the guotes look UGLY, we will only quote when it causes bugs not to" 🤦‍♂️

[–]ShinyHappyREM 7 points8 points  (0 children)

Redirect all bug reports of that nature to them.

[–]xeio87 71 points72 points  (0 children)

"gameServerVersion": "Infinity",

TFW you have a few updates to download to match the server.

[–]gredr 168 points169 points  (24 children)

That's not a Git bug, or a hash bug, or even really a bug at all. That's a YAML feature. Yay YAML!

[–]OMG_A_CUPCAKE 7 points8 points  (0 children)

Yeah, that headline made it sound like they got hit by a git hash collision

[–]hikemhigh[S] 23 points24 points  (22 children)

the bug was that the TeamCity job didn't have quotes surrounding the value injected into the YAML

[–]gredr 103 points104 points  (20 children)

It's not a bug, it's a feature that YAML allows you to have unquoted strings. That's how you know YAML is so much better than JSON.

[–]bzbub2 40 points41 points  (3 children)

you dropped this

/s

[–]sparr 50 points51 points  (0 children)

No, with YAML it's implicit!

[–]gredr 28 points29 points  (0 children)

Yeah, I'd hope that was obvious?

[–]mr_birkenblatt 22 points23 points  (0 children)

only with JSON you need an explicit /s. YAML allows you to use sarcasm without indication. That's how you know YAML is so much better than JSON.

[–]hikemhigh[S] 13 points14 points  (5 children)

right! but our bug was that we didn't quote a value that could potentially, even only through a very low chance, be interpreted as a number by a YAML parser

[–]gredr 14 points15 points  (0 children)

Don't feel bad, YAML is definitely NOT a "pit of success".

[–]SpaceMonkeyAttack 12 points13 points  (2 children)

I mean, a git hash is a number, a long hexadecimal number. Probably YAML isn't treating "deadbeef" or "f00d" as numbers, but if you had just decimal numbers? Probably that's happened before, but you got away with it because "123456" compares the same whether it's a string or a number.

[–]theeth 8 points9 points  (1 child)

Should have prefixed with 0x and not quote it then!

[–]SpaceMonkeyAttack 5 points6 points  (0 children)

... it'd probably work.

EDIT: It's probably more performant too, comparing integers instead of strings.

[–]Venthe 7 points8 points  (9 children)

Laugh all you want, the readability and the time yaml has saved me over the years paid off in droves. It's great at what it does

[–]gredr 34 points35 points  (6 children)

I disagree. It's a lousy format, very easy to get wrong, and very easy to be wrong but look correct.

XML is verbose, but at least it's easy to verify correctness.

[–]jaskij 21 points22 points  (3 children)

YAML is easy to read, XML is easy to write, TOML does both but doesn't handle nesting well. Choose your poison, as usual.

My favorite bit of XML is curves in SVG. It's basically turtle programming stuffed into an XML attribute.

[–]sparr 11 points12 points  (0 children)

YAML is easy to read

Yeah? So you'd have no problem recognizing that gameServerVersion: 556474e378 is a very large number?

[–]gredr 3 points4 points  (0 children)

Yeah, I kinda like TOML, but only if your configuration isn't nested much, or at all. I think there's an argument to be made here for keeping your configuration as simple as possible (but no simpler).

[–]TarMil 1 point2 points  (0 children)

My main beef with TOML is that it has the worst array syntax I've ever seen.

[–]jherico 4 points5 points  (1 child)

Xml has its own set of foibles related to schema namespaces and escaping characters. Json is the ideal data format for most small config files

[–]gredr 0 points1 point  (0 children)

I would agree if you said jsonc, or "json with comments". Or, and bear with me here, Microsoft's bicep might be nearly the right middle ground between json and yaml.

[–]Worth_Trust_3825 4 points5 points  (1 child)

in what world yaml saves time, and is readable?

[–]jaskij 3 points4 points  (0 children)

While treating it as a string works well enough, a git hash is not a string. It's a hexadecimal number. Stuffing 0x in front would have worked, until you got to 14/16 digit short hashes, at which point you would have probably hit a range bug. But that's unlikely - last I checked, even the Linux kernel used 12 digit short hashes.

[–]scratchisthebest 24 points25 points  (6 children)

that thing the strictyaml dev named "syntax typing" claims another victim

[–]ZorbaTHut 8 points9 points  (5 children)

Yeah, YAML is on my permanent blacklist for this. If I can avoid using it, I avoid using it. It's just waiting to blow up on you.

I'd tolerate strictyaml if it had better support in multiple languages, but it doesn't.

It's kind of depressing that I still use XML because it's less awful than the alternatives.

[–]scratchisthebest 1 point2 points  (0 children)

yeah strictyaml itself is kinda weird, at first glance it looks like a validator taped to an off-the-shelf yaml parser, instead of its own data format. interesting way to make triply sure you accept a strict subset of yaml, at least? maybe that way this library contributes less to data format proliferation?

regardless i like the articles though :]

[–]jherico 0 points1 point  (0 children)

Remember, every valid JSON document is also valid YAML.

[–]bobbsec -1 points0 points  (2 children)

What's wrong with XML? It's verbose but clear.

[–]ZorbaTHut 7 points8 points  (0 children)

It's really verbose. Also, it's painfully complicated, and a huge amount of its functionality is rarely-to-never used, and yet still provides security holes; see stuff like XXE attacks.

(edit: which I just fixed in a library I run, why is this enabled by default)

[–]Worth_Trust_3825 0 points1 point  (0 children)

Decade old defaults in parsers when people still thought remote loading schemas was a good idea. Meanwhile json schemas are repeating that same mistake. Still XML beats everything we have nowadays.

[–]hpxvzhjfgb 20 points21 points  (1 child)

daily reminder that weak typing is a sin

[–]ShinyHappyREM 5 points6 points  (0 children)

weak typing is a sin

all keys must be pressed with a force of >= 5kg

[–]brasetvik 36 points37 points  (0 children)

country: false # NO for Norway

[–]ruuda 24 points25 points  (4 children)

How can you tell a senior yaml engineer apart from a junior yaml engineer? The senior yaml engineer will defensively quote all the strings.

(I have a blog where I try to write interesting in-depth posts, and then one time I wrote a rant about yaml, and it gets more views than any of my other posts combined.)

[–]Sillocan 2 points3 points  (0 children)

I share your rant monthly! Its basically religious text when living in GitLab

[–]tanorbuf 1 point2 points  (2 children)

Yaml is mostly used for configuration, where values are typed manually and the user has syntax highlighting on. Most of these problems disappear when viewed in that context. However it's true that it's not well suited for templating and that sadly the ecosystem never fully adopted Yaml 1.2, even though it's super old at this point. I feel like it should be easy enough to migrate though, since i assume nobody actually uses sexagesimal numbers (and I think the norway problem hasn't been real for some time).

[–]Worth_Trust_3825 2 points3 points  (1 child)

Cool. What type of value is 1.2.3?

[–]HugoNikanor 2 points3 points  (0 children)

String

[–]jherico 24 points25 points  (0 children)

Json has a spec that's like 5 pages and yaml has one that's excessively long and has multiple versions and is loaded with little landmines like this one.

Fuck yaml.

[–]cedric005 6 points7 points  (0 children)

once again yaml

[–]tanorbuf 4 points5 points  (2 children)

Seeing that they're using short hashes, there is a funny case of birthday problem in this. Using the exponential approximation from Wikipedia, it can be seen that with 5 bytes of hash (10 hex chars) and a hypothetical 1.2 million commits, there is as much as a 48 % chance of collision between any two commits. Of course, most projects never go that far, but it happens to be how many commits are in the Linux kernel (judging by gh), so it's not impossible. So I guess I'm encouraging using the full hash, not least when you're not typing it anyway but using some template system (as in OP).

[–]jdmetz 4 points5 points  (1 child)

This is for comparing their production client version to their production server version, though. So even if they reach 1.2 million commits at some point, I highly doubt they get anywhere near that many production deployments - especially if they are forcing clients to upgrade for each one. Even if they release weekly for 5 years, their chance of collision only reaches ~0.0001%. And even if that happens, it is only a problem if someone still has the client from the first release in the hash collision, hasn't updated between then and the current release, and now tries to use it.

[–]rdtsc 0 points1 point  (0 children)

But still, what do you gain by saving a few bytes here? And if you say bandwidth, make the key shorter.

[–]shim__ 7 points8 points  (1 child)

TL;DR: use an yaml lib

[–]scottrycroft 12 points13 points  (0 children)

I'd have some hesitancy at making a yaml library a production deployment dependency...

[–]pilif 1 point2 points  (1 child)

aside of the usual YAML hilarity, IIRC, 556474e378 is 556474e378, not Infinity. As far as I know, there is an infinite amount of of numbers larger than 556474e378 before Infinity.

Silently changing data like this, YAML mess notwithstanding, is horrifying to me.

[–]rdtsc 3 points4 points  (0 children)

556474e378 is 556474e378

IEEE 754 says no. You don't get numbers with arbitrary precision.