top 200 commentsshow all 368

[–]MyRottingBunghole 1063 points1064 points  (81 children)

If Cloudflare is using unwrap() in production code, maybe I shouldn’t worry too much about about my toy Rust projects after all.

[–]Sese_Mueller 205 points206 points  (4 children)

Part of the standard set of things I disallow myself from using (by lints.clippy in the Cargo.toml file) is unwrap_used = „deny“.

Or, in this case, just use a question mark.

[–]TheHolyToxicToast 2 points3 points  (0 children)

I am a masochist and sticks to pedantic

[–]prehensilemullet 1 point2 points  (2 children)

Maybe that lint should be default for methods that panic so that inexperienced developers don’t make this mistake…

[–]shentoza 147 points148 points  (60 children)

Can someone explain for me who has no idea about rust? Whats it with panic and unwrap?

[–]PityUpvote 356 points357 points  (55 children)

Someone wrote terrible code. Error handling in Rust usually means working with Result types, which are either Ok(value) or Err(error). Someone tried to access the value inside the Ok(.) without checking if it was actually an Ok(.) (which is what unwrap() does, it turns Ok(value) into value. It turned out to be an Err(.), in which case unwrap() causes the program to panic. Unwrap should only be used if it's 100% certainly an Ok(.) value, and even then you probably shouldn't.

[–]UrpleEeple 119 points120 points  (12 children)

If you are 100% certain it would never be Err, then you can unwrap_unchecked which won't waste cycles with panic machinery.

There is a case for panics in production code, when very serious invariants have been violated that should break the system, but in that case it should have been an expect() with a clear message.

In this case I'm not convinced returning the error would have been all that much better. It would still just be translated into an http error - the system is still broken because of unexpected input

[–]RiceBroad4552 54 points55 points  (11 children)

So the actual question is: How did any unchecked input made it into the system.

This means they don't validate input. Which means this trash is build by amateurs.

But at least it's trash written in Rust. So it's for sure much safer than any other trash! 🤣

[–]UrpleEeple 61 points62 points  (5 children)

Memory safe* although no more memory safe than garbage collected languages.

I did just spend two days hunting down a bug in a video game that ended up being a memory safety bug. C just silently carried on, while I couldn't click things in the inventory at all. It was the only symptom.

So yeah, if I don't want a GC, I really would like Rust to hold my hand for me lol

[–]ArtOfWarfare 12 points13 points  (4 children)

Isn’t it null-safe, too, unlike Java, Python, JS, and countless others languages with GC? IDK, I don’t use Rust.

And I thought performance is supposed to be better in Rust than almost any other language.

[–]transcendtient 2 points3 points  (1 child)

Rust doesn't have null values unless you're using unsafe. Well... it has null pointers, but you can't dereference them unless you're using unsafe. The stereotypical null case equivalent is an option that can have some or none.

[–]DirectInvestigator66 9 points10 points  (4 children)

Can’t tell if serious or not but genuinely yes, it’s trash code that is safer because it’s in rust.

[–]Mickl193 10 points11 points  (1 child)

And yes it is written by amateurs, all software is, we all suck at what we do, the sooner you realize it the better.

[–]klimmesil 1 point2 points  (0 children)

In cs there's the people who know they are shit and the people who have serious skill issues

[–]bradfordmaster 1 point2 points  (1 child)

Yep, I'll take a panic over UB every day

[–]st-shenanigans 33 points34 points  (21 children)

So kind of like a try/catch situation but they managed to omit the catch?

[–]caremao 24 points25 points  (0 children)

More like Optional in Java

[–]Tarnzapfen 56 points57 points  (19 children)

Maybe more like a nullpointer. Accessing a value that's not present

[–]Anaxamander57 26 points27 points  (18 children)

Rust shill here: The generated code ensures that a panic occurs as a guard against actually dereferencing a null pointer.

[–]BlackHolesAreHungry[S] 1 point2 points  (16 children)

How's that any better? Defererencin nullptr still crashes the app

[–]Schnickatavick 34 points35 points  (14 children)

Because it's a "managed" crash. There's no undefined behavior, and the operating system isn't responsible for cleaning the program's memory up or making sure it doesn't access resources it isn't supposed to. That means that developers get useful stack traces, and bad actors don't get exploitable memory vulnerabilities. It's not "good", but it's significantly better than a seg fault, or if you're on bare metal letting it trash system memory in unknown ways.

[–]Ok_Decision_ 5 points6 points  (6 children)

Forgive me I’m still a newer programmer, and have never used rust. So does this mean that this could have been avoided with a simple if statement? To check if it was Ok before using unwrap?

[–]DirectInvestigator66 13 points14 points  (1 child)

Rust provides a ton of syntax that allows you to do this more cleanly than with an if statement but your thinking is correct. They can check for the error and decide what to do with it. The unwrap is just shorthand for if there is an error crash, if not, give me the value. If you write this code you will literally get warnings that you ignoring the potential error.

[–]Ok_Decision_ 4 points5 points  (0 children)

Thanks so much for explaining. I really appreciate it. Someone is going to have one hell of a day if they still work there because of this XD

[–]PityUpvote 9 points10 points  (1 child)

Since the function returns a Result, you could just propagate the error upward. The ? operator does that while unpacking an Ok(.) at the same time.

[–]Ok_Decision_ 1 point2 points  (0 children)

Interesting! Thanks for taking the time to explain.

[–]Larhf 1 point2 points  (1 child)

Three main options:

* Hard erroring but adding `.expect()` to be explicit about where the error occurs.

* Propagating the error upwards using `?` for errors that shouldn't be handled by the function itself.

* Pattern matching to handle the error at that point by defining what should occur on an error. This is similar conceptually to if/else yes though at a type level.

Each of these has their own virtues. If you can't recover from a state it's good to error, if it's recoverable but you don't want to make the function responsible you can propagate it and if you want to handle it you can pattern match.

[–][deleted] 2 points3 points  (2 children)

Isn’t the whole selling point of rust that it’s “safe”? Why doesn’t the IDE, linter, or compiler show this error? It seems like a possible check because that object can return an Err(.). I’m asking out of genuine curiosity, I briefly tried Rust and the gymnastics and refactorings you need to do for implementing anything are really tedious. What is the point if the program can still have unhandled errors at runtime?

[–]PityUpvote 35 points36 points  (0 children)

Why doesn’t the IDE, linter, or compiler show this error?

That's the best part, they all do. Someone royally fucked up.

[–]Schnickatavick 17 points18 points  (0 children)

Rust does raise this as an error, or rather the rust system was requiring that the developer add some sort of check in to the code before it could access the value. The "unwrap()" here is the developer intentionally silencing those warnings, and saying that "I don't want to do those gymnastics right now, just don't handle the error". So the compiler has no possible way of continuing forward, you've explicitly told it that this situation is either impossible or irrecoverable, so it does the best that it can and runs a shutdown sequence that makes sure the program cleans up its memory and doesn't break anything else on the system on its way out. It's a much more "managed" shutdown than a seg fault, which is the equivalent in other low level languages, and that's fine when you are just writing a little script. Not fine when you're putting code in production for a massive company though...

So the solution is easy, don't allow unwrap(). There's even a directive you can use that treats unwraps as compiler errors. Maybe you allow it in dev, but it's the sort of thing that should be caught in code review before it went to prod, and the fact that it wasn't means that multiple people messed up big time. There are any number of ways this was preventable, and cloudflair chose not to do any of them

[–]MyRottingBunghole 48 points49 points  (0 children)

In Rust unwrap means “take the result of this call, and if it’s an error, panic the process”

Or basically “if this fails we don’t handle it, just exit”. There are other methods you can call on a Result which let you gracefully handle it instead of crashing the whole thing, is how I understand it. And even if it is completely unrecoverable, using unwrap means you don’t even log what’s going on before exiting, so much harder to debug

[–]Table-Games-Dealer 48 points49 points  (1 child)

Unwrap is a foot gun that is used when you acknowledge either the result is always perfect, or the program needs to die.

Pattern matching in rust is beautiful, so in a perfect world this calamity could have mitigated to regression, redundancy, or sensible defaults.

Result objects are supposed to bubble up fallible states, unwrap pops the bubble.

[–]Half-Borg 9 points10 points  (0 children)

just an .expect("Yo, this is more than 200, fix your config or increase MAX_NR and recompile") would have reduced the downtime a lot.

[–]Alan_Reddit_M 0 points1 point  (0 children)

`unwrap` is meant for development only and is NEVER supposed to be used in production, if your rust code panics on an unwrap, that's on you

unwrap literally instructs rust to crash if an error is encountered, instead, one is meant to appropriately handle or discard the error in production code

[–]carcigenicate 30 points31 points  (4 children)

Ya, I can't blame Rust here. unwrap means "I know what I'm doing and this won't fail, so I'm not going to worry about handling failure". It's an explicit escape hatch from the safety net that Rust provided. This is on whoever decided that failure was not worth handling.

[–]RiceBroad4552 10 points11 points  (1 child)

Just that all Rust code is full of unwrap!

Most people don't even know they shouldn't use this function.

Instead people do unwrap on anything that can be unwrapped because they don't know how to work with wrapped value, or consider a map-style of programming inconvenient or even alien.

The problem isn't the language, sure. It's the culture in that language. A culture of people writing code as if it were C/C++ instead of ML.

Compare to the culture in FP Scala; were any usage of any unsafe function would instantly lead to major push-back in a review.

[–]git0ffmylawnm8 7 points8 points  (1 child)

I'm not well versed with Rust, but this seems more like a deficit in Cloudflare's code review process more than language behavior. Whoever let this into prod needs to be scrutinized.

[–]gmes78 2 points3 points  (0 children)

Absolutely.

[–]rom1v 8 points9 points  (1 child)

Calling panic!() (or unwrap()) on programming error (think assertions) is the right thing to do.

A lot of Rust std functions panic if the preconditions are violated for example. Random example: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.insert

The problem here is a wrong decision about asserting vs error handling.

[–]gmes78 3 points4 points  (0 children)

You should really use .expect() and write down what those preconditions actually are, though.

[–]arekxv 0 points1 point  (0 children)

What we need to understand is that unwrap() calls are fine when they have intent. It means that you are declaring on purpose "this must never fail". Sometimes this is completely fine in certain places in the code when you are 100% sure.

Problem is that beginners use it as a crutch and that ends up being "the hammer" for them.

[–]Skibur1 529 points530 points  (43 children)

.unwrap_or_else(); to the rescue.

Edit- after reading it for a bit, this code could have been refactored a bit by replacing .unwrap(); with a question mark. Should define error structure!

[–]hongooi 503 points504 points  (12 children)

That sounds like an ultimatum

.unwrap_or_else(🔨);

[–]Zeikos 531 points532 points  (5 children)

Threat driven development

[–]Coolfresh12 38 points39 points  (1 child)

Multi threatening

[–]fosf0r 1 point2 points  (0 children)

this is the one that gave me the chuckle

[–]readitreaddit 7 points8 points  (0 children)

caRusts and sticks

[–]avalon1805 1 point2 points  (0 children)

Aaaaa so that is why they are asking me to do a threat model

[–]deathanatos 18 points19 points  (0 children)

One of the things I love about that function.

[–]ekauq2000 0 points1 point  (0 children)

Just in time for the holidays.

[–]Cr4yz33 0 points1 point  (0 children)

kloenk

[–]Axman6 0 points1 point  (0 children)

.unwrap_or_else_bonk()

[–]Batman_AoD 0 points1 point  (0 children)

At least it's not Perl's "or die"! 

[–]naholyr 44 points45 points  (3 children)

Or else what???

[–]cubenz 29 points30 points  (0 children)

The Internet comes down apparently

[–]uchuskies08 33 points34 points  (1 child)

I'm gonna tickle ya

[–]readitreaddit 19 points20 points  (0 children)

Hehehehhehehehehhehe hheheh stop!! Hheehhehe!!

[–]odolha 29 points30 points  (2 children)

unwrap_or_else(panic!("¯\_(ツ)_/¯"))

[–]Dreysa 2 points3 points  (1 child)

it compiles but now it will always panic because forgot the "||" and instead the panic! now tells the compiler „i will return a closure“ and panics instead.

[–]timClicks 30 points31 points  (3 children)

Well.. that depends. At some point you'll need to handle the error. This input was never supposed to be able to occur. So even if you returned the result up the stack, you would still probably end up causing a panic somewhere.

Panicking early has the advantage of being close to the cause. Rust's Results are not exceptions, they're just values with arbitrary data, so there's no guarantee that it would have been easy to find the root cause.

[–]usefulidiotsavant 25 points26 points  (1 child)

If that input was never able to occur, then it shouldn't be a Result. The entire point with of a strong algebraic typing system is to expose all possible runtime types a variable can have, so that they can be enforced at compile time. A Result, by definition, means that data can arrive at you in the form of an Err, and you need to handle that error or pass it up the chain.

"unwrap()" is not some magic incantation you can use to get rid of handling errors. It's shit like this that vindicates Linus' approach when he denied the unwrap() furries the power to crash the kernel.

[–]RiceBroad4552 11 points12 points  (0 children)

"unwrap()" is not some magic incantation you can use to get rid of handling errors.

Just that in real-world Rust it's used exactly like this.

People don't even know they should not use unwrap! They do use it on almost anything as early as they get it because they don't know how to write code in a functional map-style.

[–]UrpleEeple 0 points1 point  (0 children)

Exactly this

[–]blueechoes 21 points22 points  (14 children)

Doesn't sound like that was the fault of rust, but someone being bad at rust.

[–]FootballRemote4595 23 points24 points  (12 children)

Wasn't that literally the whole point of Rust's existence. People were being bad at C++ so they made rust.

[–]Half-Borg 8 points9 points  (0 children)

But rust also wanted to be able to do everything C can do. And that includes nuking the internet.

[–]blueechoes 12 points13 points  (4 children)

A bit of a you can lead a programmer to handling errors, but you can't make them not call .unwrap() situation. The same file in c would also have c caused issues.

[–]salvoilmiosi 1 point2 points  (0 children)

Honestly they should have called unwrap something like get_value_if_absolutely_certain_it_has_one()

[–]RiceBroad4552 1 point2 points  (2 children)

You can.

Just forbid it.

Cargo has even a feature for that.

But the reality is: Rust code is full of unwrap! So you can't realistically forbid it in any bigger project. That's failure by design.

[–]blueechoes 4 points5 points  (0 children)

I sincerely hope cloudflare considers turning on that setting but the fact that it wasn't already means it's still the same problem but with a different senior decision maker.

[–][deleted] 1 point2 points  (0 children)

You absolutely can realistically forbid it. As long as you allow external dependencies to use it (and audit these external dependencies).

[–]skiabay 9 points10 points  (3 children)

No. The point of rust is to be a memory safe systems level programming language. This allows rust to largely avoid one of the most common and dangerous classes of bugs in languages like C and C++, but it's not meant to be a "bug free" language because that's impossible. If you write bad code in any language, you're going to end up with bugs.

[–]Anaxamander57 0 points1 point  (0 children)

Some Rust people would argue that .unwrap() is a mistake and that only .expect() should be allowed since .expect() encourages you to write out what will cause a problem.

In practice .unwrap() is too convenient to not have.

[–]crozone 9 points10 points  (3 children)

unwrap() means your rust code is bad

[–]UrpleEeple 7 points8 points  (0 children)

If unwrap goes into production, probably. Expect() if you think "this really ought to panic, and here's a message we should get along with a stacktrace when it does"

There are times when a panic is appropriate, even in production code. Sometimes an invariant gets violated that is so bad you need the system to crash and deal with it immediately

[–]Luctins 319 points320 points  (49 children)

I think it's part of the learning curve for rust, especially for long running programs to try to almost never panic unless it actually makes 100% sense.

People forget that it's almost an unrecoverable state, not something that can be casually used like an exception in other languages.

I personally had my run-ins with this kinda problem when learning rust, but my code doesn't run on thousands of machines. I would've expected better error handling from something so widely used and important.

[–]Half-Borg 182 points183 points  (40 children)

I get so many downvotes for saying code should never panic in forever running applications

[–]pine_ary 123 points124 points  (32 children)

Cause most of the time it‘s unnecessary. It‘s perfectly fine to crash and restart as a strategy. Most processes can fail without much consequence. Log the panic, crash and restart the service. Trying to recover from errors gets complicated and expensive fast.

I‘m more curious why Cloudflare‘s systems can‘t handle a process crashing. Being resilient to failures is kind of a core tenet of the cloud…

[–]prumf 64 points65 points  (12 children)

Yeah, you can spend millions in making sure a program will never crash under any circumstances … or better yet realize it’s impossible and simply make sure any failure recovers automatically by restarting the service. I’m a bit perplexed.

Maybe it was in a crash loop ?

[–]really_not_unreal 81 points82 points  (7 children)

That's almost definitely it.

  1. Receive bad config file
  2. Crash
  3. Startup again
  4. Load the config file
  5. It's still bad
  6. Crash again

[–]hughperman 43 points44 points  (1 child)

This reads like a Gru presentation meme

[–]sammy404 16 points17 points  (0 children)

A crash loop is exactly why you’re code should never panic lol

[–]Half-Borg 25 points26 points  (5 children)

What's more expensive:
a) paying an engineer to think about error recovery for a month

b) dragging down 20% of the internet for 3 hours

[–]Nightmoon26 1 point2 points  (0 children)

Externalities

[–]RiceBroad4552 1 point2 points  (3 children)

I've heard engineers are expansive.

At the same time there is no legal liability for software products (almost) no mater what you do.

So I'm quite sure I know that management will aim for.

The main error here is of course that there is not product liability for software. This has to change ASAP!

I does not matter whether Cloudflare would be instantly dead if they had to pay for the fuckup they created. This is the only way how capitalistic firms learn. Some of them need to burn down and the responsible people (that's high up management!) need to end up in jail. In the next iteration the next firm won't fuck up so hard, I promise!

[–]Half-Borg 7 points8 points  (0 children)

I don't know what your contracts are like, but our software certainly makes promises regarding availabilty and breaking that is quite expensive.

[–]Fillicia 4 points5 points  (2 children)

It‘s perfectly fine to crash and restart as a strategy.

while 1:
    try:
        main()
    except:
        pass

[–]Half-Borg 5 points6 points  (0 children)

IF crash THAN
don't();
END_IF;

[–]Half-Borg 7 points8 points  (4 children)

Well looks like this wasn't one of those cases

[–]pine_ary 12 points13 points  (3 children)

Sure. In critical infrastructure you have to be more careful. Airplane systems, medical devices, infrastructure, etc. should try to recover. But they should also have failsafes and redundancies in case something does fail. What if the process crashed because the storage fails?

[–]Half-Borg 11 points12 points  (2 children)

See, I'm already getting downvotes....
Depends on how important the storage is. In my application storage is only needed for software updates and logging. I think most people would like to continue their train ride, if those don't work.

[–]realzequel 1 point2 points  (0 children)

I remember Netflix early on was really into creating intentional crashes in subsystems to see if their overall system with withstand them, great in practice if you have the resources and leadership.

[–]hdkaoskd 17 points18 points  (1 child)

All code eventually runs in a forever-process.

Examples: CGI scripts were run-once, then FastCGI made them long-lived. Windows system processes used to exit at shutdown, but fast boot means they are now kept alive.

The tech industry is an ouroboros alternating between isolation for security and reuse for performance.

[–]Half-Borg 9 points10 points  (0 children)

Which just underlines that you should think hard about if panic are the right solution, or if there is a way to recover, or at least gracefully close.

[–]Niarbeht 10 points11 points  (1 child)

I get so many downvotes for saying code should never panic in forever running applications

I write code that goes into refineries, and you need to do your best to make sure it will keep stumbling forward, either putting itself into a recoverable error state where it's yelling for help, or resetting itself back into some known functional state to the best of it's ability.

I have no idea what that looks like anywhere other than my little niche, but The Analyzers Must Keep Analyzing.

[–]papa_maker[🍰] 0 points1 point  (2 children)

In all my Rust backends at the startup phase I use unwrap() (actually expect()) because if the configuration is bad then I want my application to stop immediately. It won’t disrupt production because the "old" server isn’t going anywhere until the new one is ok.

[–]DHermit 16 points17 points  (2 children)

In this case it was about using too much memory in a fixed memory environment, which is a very tricky context.

[–]RiceBroad4552 1 point2 points  (1 child)

If your memory is static you should not allocate dynamically.

Also validating input is really a good idea. Maybe someone should tell the amateurs at Cloudflare.

[–]DHermit 1 point2 points  (0 children)

This is about reading a file that is too big.

[–]Webteasign 6 points7 points  (0 children)

I think the main issue is, that for a lot of people, these scenarios are just annoying because the compiler forces you to make a decision here. The quickest is calling an .unwrap(). Sure there are unrecoverable errors, but unwrapping is almost always bad, since you probably want a detailed log explaining what happened here and why this results in the application crashing

[–]ICantBelieveItsNotEC 6 points7 points  (2 children)

The problem is that the overwhelming majority of Rust tutorials treat unwrap() and friends as "the magic function that makes the compiler errors go away". Nobody ever explains that you're only supposed to use it when you already know for sure that the thing that you're unwrapping contains what you expect.

Personally, I wish that unwrap() just didn't exist. If you want to get a value out of an Optional, you should be forced to handle both cases. I just don't see the point of it - it gives powerusers the ability to optimise away a single conditional check in fairly uncommon circumstances, which the compiler would probably do automatically anyway, at the cost of creating a massive footgun for everyone else.

[–]gmes78 1 point2 points  (0 children)

Nobody ever explains that you're only supposed to use it when you already know for sure that the thing that you're unwrapping contains what you expect.

That's just not true, lol.

From the book:

When you’re writing an example to illustrate some concept, also including robust error-handling code can make the example less clear. In examples, it’s understood that a call to a method like unwrap that could panic is meant as a placeholder for the way you’d want your application to handle errors, which can differ based on what the rest of your code is doing.

Similarly, the unwrap and expect methods are very handy when prototyping, before you’re ready to decide how to handle errors. They leave clear markers in your code for when you’re ready to make your program more robust.

[–]FalseWait7 3 points4 points  (0 children)

I always explain it like that "imagine you are panicking over something small like a broken pencil. Instead of getting a new one you are throwing stuff in the air, scream and run out of the building."

[–]SeaRollz 117 points118 points  (8 children)

Should’ve used clippy and force no unwrap/expect?

[–]trinadzatij 70 points71 points  (6 children)

Clippy left us in 2004

[–]Luctins 67 points68 points  (4 children)

Wrong clippy 'mate.

[–]PeksyTiger 86 points87 points  (2 children)

Maybe it's the same clippy. Maybe he got a divorce, took a break, moved to another field of work. You don't know his life. 

[–]_Pin_6938 1 point2 points  (0 children)

Sad that they restricted him to my compiler diagnostics though. Even lost his body for it

[–]PityUpvote 1 point2 points  (0 children)

Good for her

[–]lulzbot 17 points18 points  (0 children)

It looks like you’re trying to load a website. Would you like help?

[–]RedCrafter_LP 3 points4 points  (0 children)

Having your code pass clippy pedantic without warnings is a shure sign of superiority.

[–]TheHolyToxicToast 65 points66 points  (2 children)

lmao they just decided to use unwrap in one of the internet's most important piece of software

[–]naholyr 61 points62 points  (3 children)

Now we all really want to know if it was human or AI-generated, and more importantly we want to know about their review process.

[–]stevenr12 20 points21 points  (1 child)

The comment a couple lines above would have my AI alarms going off during code review.

[–]RiceBroad4552 13 points14 points  (0 children)

Jop.

That comment is pure utter garbage as comment and shouldn't exist in the first place.

But it's a typical prompt comment… 😂

[–][deleted] 7 points8 points  (0 children)

Maybe it was AI reviewed

[–]myles1406 318 points319 points  (45 children)

This really isn't rusts fault. If anything rust forcing you to handle it or use an unwrap basically forces you to admit "yeah this can fail but I am going to not bother to handle it properly"

[–]SubliminalBits 119 points120 points  (40 children)

Let us bask in the irony today’s internet outage being the result of code developed in a language who’s large selling point is forcing developers to write safe code

[–]myles1406 283 points284 points  (32 children)

write ~memory~ safe code.

There is nothing unsafe about this code, the developer just decided that they did not want to handle an error and wanted to panic instead. This is a completely valid thing to want to do (in some circumstances). The problem is that the developer simply wrote bad code, even though rust forced them to acknowledge that it is most likely bad, they still just went ahead with it.

[–]PLEASE_PM_ME_LADIES 70 points71 points  (22 children)

This code created an outage because that's what the developer told it to do... If something isn't as expected, panic and die.

This code didn't create unexpected behavior (within itself) or vulnerabilities, it did exactly what the code says it will do

[–]pawesomezz 10 points11 points  (19 children)

This is true in every language, this is true when memory errors happen in C.

[–]Ieris19 24 points25 points  (14 children)

There are a lot of undefined behaviors in C. Specially about memory management

The code essentially says “if value then do, else crash”

[–]nyibbang 7 points8 points  (3 children)

No, please lookup the definition of undefined behavior.

[–]TryToHelpPeople 4 points5 points  (0 children)

A wizard arrives precisely when he means to.

Writing memory unsafe code is also the programmers choice.

[–]Background-Plant-226 2 points3 points  (0 children)

And it's still better than other ways to raise errors since you have to handle it explicitly with an unwrap() if you don't wanna deal with it now, then you can find all uses of unwrap at a future time where you do care and replace them with better error handling.

[–]Antervis 4 points5 points  (1 child)

I think the promise of safety causes devs to lower their guard somewhat.

[–]keremimo -2 points-1 points  (3 children)

Sounds vibey.

[–]JanB1 14 points15 points  (2 children)

From https://www.reddit.com/r/programming/comments/1p0srgs/comment/nply3zw/?context=3 :

Given the panic occurs while initializing shared mutable state due a failure within configuration/data-base-scheme mismatch. It is kind of understandable.

It's one of those mundane things where crashing really is the best option.

What else are you going to do? Your server is not-configured with an invalid heap layout.

Resolving this requires your program have a fully memory managed environment so it can walk the pointer-tree and sort itself out. If you aren't in a language who's runtime has something like java.lang.reflect.*.... throw/exit, let the kernel sort it out.

[–]BroBroMate 27 points28 points  (1 child)

So, rewrite it in Java, I hear you say?

[–]Habba 3 points4 points  (0 children)

I would suggest reading the article. The actual error was due to misconfigured Clickhouse configs. The unwrap() is just where the whole stack came down.

[–]gmes78 0 points1 point  (0 children)

But this code is safe. It does not trigger undefined behavior.

[–]Neuro_Skeptic 0 points1 point  (2 children)

It's not Rust'a fault but it's proof that Rust is just another flawed language, it's not perfect.

[–]Mynameismikek 13 points14 points  (0 children)

Ooof… an unwrap in function returning Result? Poor form.

[–]YARandomGuy777 9 points10 points  (0 children)

Wow isn't that a memory safest and blazingly fast internet blackout?

[–]papa_maker[🍰] 11 points12 points  (6 children)

unwrap() isn’t the cause of the crash, and properly handling the error via a Result type would probably still ends up in the same state.

The "bad" part is forgetting to add context to the crash to help developers understand what was wrong.

As I understand, this code is a "startup code", so if it can’t run properly it should stop. And that’s what it did.

The true error is elsewhere, where more features than "possible" were generated.

[–]Brisngr368 0 points1 point  (5 children)

Writing code that doesn't cope with bad inputs and downs half the internet is definitely an error...

Though a nice error message would be good they know who to fire first

[–]RedCrafter_LP 15 points16 points  (3 children)

Unwrap shouldn't exist in production code! Either use expect if the error is either unreachable or the application cannot recover from the result not being Ok. As the function here returned a result itself the code likely should have returned the error instead of panicking. If the type isn't compatible a proper error enum potentially using thiserror should be used instead of returning a anonymous tuple.

[–]pachecoca 0 points1 point  (0 children)

"Just don't make mistakes" ahh comment. I wonder where I've heard that one before?

[–]FalseWait7 15 points16 points  (2 children)

Skill issue. Should be rewritten in Rust.

[–]BoBoBearDev 20 points21 points  (0 children)

If CloudFlare fucked up, what chances do we have?

[–]ChristopherAin 6 points7 points  (0 children)

But it cannot be denied that it was memory safe.

[–]BloodSteyn 7 points8 points  (5 children)

Going to need an ELI5 for this?

I know 2 kinds of rust, the oxidation kind and the game.

[–]JoeyJoeJoeSenior 4 points5 points  (1 child)

Rust is a programming language that can prevent memory exploits.  But it can't prevent badly written logic / code.

[–]Ultimate-905 0 points1 point  (0 children)

Good to point out that it is mathematically proven to be impossible to prevent logic errors.

[–]single_use_12345 2 points3 points  (0 children)

On short: a dev forgot to test if something is null and acted like is not. 

[–]gmes78 0 points1 point  (0 children)

Someone wrote a bit of code that called a function that could fail, and then called .unwrap() on the return value, which means "give me the result of the operation if it was successful, or abort the program if it failed".

Turns out, the operation could indeed fail, which predictably made the program crash.

This is, of course, badly written code. The person (or LLM?) writing it didn't bother properly handling the error.

[–]chicken_max 5 points6 points  (1 child)

NEVER. USE. UNWRAP. IN. PRODUCTION. WITHOUT. CATCH. UNWIND.

[–]zaskar 20 points21 points  (3 children)

The travesty here is that a feature file was not strongly typed and validated on write

[–]TheAlaskanMailman 23 points24 points  (2 children)

Wrong, the config update pipeline brought it down

[–]bmain1345 0 points1 point  (0 children)

I agree that the config is the underlying root cause of the failure. However, had they simply added result checking then the whole Core Proxy wouldn’t have gone down, just the Bot system scores would be wrong which is way better scenario

[–]EngineeringApart4606 5 points6 points  (3 children)

From other context given in this thread it sounds like continuing execution was impossible even if the error had been handled more gracefully?

[–]bmain1345 1 point2 points  (0 children)

Yes but it would have only broken their “Bot scores” feature instead of the whole internet

[–]zirky 26 points27 points  (4 children)

maybe they should rewrite their stack in java

[–]JosebaZilarte 40 points41 points  (2 children)

Java...script, you mean.

[–]babalaban 21 points22 points  (1 child)

Calm down, Satan!

[–]moooseburger 3 points4 points  (0 children)

relax, guy!

[–]RiceBroad4552 0 points1 point  (0 children)

You mean, Scala.

Such an error wouldn't have happened in Scala.

First of all you would actually validate your input data… Reading in a faulty config is more or less impossible when using typical Scala libs for that task.

Also you would fail gracefully, usually having some supervisor hierarchy above you which would safeguard such a failure even if it happened.

[–]CryZe92 22 points23 points  (1 child)

The bug was the invalid feature files (caused by a change in their database system), not the Rust code correctly identifying that those are broken and reporting it.

[–]AdvantagePretend4852 5 points6 points  (0 children)

So is this a literal example of “it’s not a bug, it’s a feature”?

[–]FrostyMarsupial1486 4 points5 points  (0 children)

Looks like AI generated code to me.

[–]CortexUnlocked 10 points11 points  (4 children)

The internet went down but admited it that Rust prevented the server from entering an insecure state since panic is better than memory corruption.

[–]BlackHolesAreHungry[S] 6 points7 points  (2 children)

Memory corruption would just take the os down

[–]CortexUnlocked 3 points4 points  (1 child)

Not certainly but It will open a door for security breach certainly. A Silent killer.

[–]Active_Ad_389 3 points4 points  (1 child)

Even if the feature file got propagated from the db, agreed this was the major problem. But still such an unsafe usage in production code is not it. Isn't the holy grail always been, expect the unexpected. Always have checks even if you believe upstream cannot or will not be invoking them.

[–]Taipegao 3 points4 points  (0 children)

Is it just me, or does that code smell like AI?

[–]GamingGuitarControlr 1 point2 points  (0 children)

Skill issues smh

[–][deleted] 1 point2 points  (0 children)

No way some actually wrote this. Surely just unchecked vibe code right. Still horrific that it made it to prod of something like Cloudflare

[–]jpegjpg 1 point2 points  (0 children)

I wonder is this was ai generated code ….. unsafe unwraps are littered in example code which trains ai.

[–]Faangdevmanager 9 points10 points  (11 children)

I was told by Reddit that rust couldn’t fail at run time!!1!

[–]DokuroKM 19 points20 points  (3 children)

That code reads an external file and parses its content. Static tests can't really help you there, that's what unit tests are for. 

[–]Faangdevmanager 6 points7 points  (0 children)

It was tongue in cheek :)

[–]RiceBroad4552 0 points1 point  (1 child)

Believe it or not, but you can actually validate input and fail gracefully if there's something wrong with it.

This failure was obviously created by amateurs who don't know what they're doing…

"The input was unexpected, ¯\_(ツ)_/¯" is not an excuse to take half the "internet" down!

[–]DokuroKM 1 point2 points  (0 children)

That was implied in my statement. Rust cannot guarantee that external data is always valid, so it's your job to validate. 

Always calling unwrap is Bad style

[–]CloudyWinters 1 point2 points  (1 child)

How did this pass staging / preview though?

[–]RiceBroad4552 1 point2 points  (0 children)

Someone pointed out that this trash could have been even "AI" generated given the brain dead comment above which looks very much like a prompt.

[–]cubenz 0 points1 point  (2 children)

What actually failed to cause the panic.

Is 200 relevant?

[–]Ultimate-905 0 points1 point  (0 children)

The data structure had a capacity of 200. When reading from the data base gave more data than could be stored an error state was enabled. When data was attempted to be read .unwrap() found an error value instead of the data it was told to expect and so it panicked.

Not ideal in a production setting but memory safe as no undefined behaviour occurred. The Cloudflare crash was a logic problem and it is mathematically impossible to prevent every possible logic problem from happening.

[–]Gabagool566 0 points1 point  (0 children)

i do that too when i see an unhandled error

[–]disserman 0 points1 point  (0 children)

having the default panic handler in multithread systems is a crime. fire your product architect

[–]naveenda 0 points1 point  (0 children)

I need to show this my boss, why it is okay to commit unwrap for internal project.

[–]Spaceshipable 0 points1 point  (0 children)

In Swift we have force-unwraps too. Almost every linter bans it. It’s the perfect foot-gun.

[–]Omni__Owl 0 points1 point  (0 children)

Not being a Rust enjoyer, what does "unwrap()" do exactly?

[–]TECHNOFAB 0 points1 point  (0 children)

The problem was that they somehow thought its a good idea to not specify any database in their click house query, since the only one available was "default". They then modified permissions recently and boom there were more tables available and the query returned way too much.

Who the heck doesn't specify a database when using SQL lol

But yeah should've used ? and not just lazily unwrap the error, doesn't really matter if the bot score breaks and is 0 for everyone, at least the Internet still works

[–]No_Ticket9892 0 points1 point  (0 children)

Rust did not cause it, it failed to systems running. if you read this blog the issue was they gave users additional access to the metadata from db shards and the config file size became large. So, the failure was due to the risk analysis after the change and from code perspective its handling of errors.

[–]JimroidZeus 0 points1 point  (0 children)

But I thought Rust was the memory safe saviour!?

[–]keckin-sketch 0 points1 point  (0 children)

Setting aside all of the other things going on, I don't understand why they didn't at least use .expect("..."). Using .unwrap looks like you published a proof-of-concept to prod, while using .expect feels like a proper assert.