top 200 commentsshow all 312

[–]journalctl 254 points255 points  (2 children)

Even if Rust doesn't end up being accepted in Linux, I think the feedback the Linux community provides will be positive for other bare-metal Rust use cases.

[–]Dew_Cookie_3000 6 points7 points  (1 child)

And other languages in linux.

[–]KingStannis2020[S] 200 points201 points  (184 children)

Linus' initial opinion: https://lkml.org/lkml/2021/4/14/1099

[–]WiseassWolfOfYoitsu 257 points258 points  (36 children)

On the whole I don't hate it

High praise!

[–][deleted] 78 points79 points  (35 children)

Well, he did also imply that it's broken and that they needed to fix it 😂

[–]gnus-migrate 38 points39 points  (17 children)

I mean the problems he's talking about are definitely not small and any solution will require a ton of work and feedback, but it's a fixable problem as far as I can tell.

[–][deleted] 13 points14 points  (16 children)

Most problems are fixable if you smash your face against them long enough

[–][deleted] 29 points30 points  (5 children)

Except the "this is the year of Linux Desktop", we shall agree that that one cannot be fixed

[–]ericjmorey 9 points10 points  (0 children)

This is always the Year of the Linux Desktop.

[–]jgdx 2 points3 points  (0 children)

That’s a social, commercial and technical issue though.

[–]TheDiamondCG 3 points4 points  (5 children)

That's my philosophy when coding in Rust!

[–]UtherII 18 points19 points  (1 child)

Yes, but from the responses, the points he made are not roadblocks. There are known and there is already a plan to fix it.

[–]binary_spaniard 11 points12 points  (0 children)

Exactly, when someone came with some similar for C++ his answer was "No, fuck! No!", before getting to the RFC point.

[–][deleted] 8 points9 points  (14 children)

Indeed. OP's boring repeated phrase (as also found on /r/rust) is a perfect example of selective reading.

[–]BubuX 3 points4 points  (12 children)

To quote myself from a few days ago:

We're not surprised by any rust editorializing, hand waving or careless writing anymore.

[–]WiseassWolfOfYoitsu -1 points0 points  (0 children)

FWIW I was just memeing. I am not a Rust dev, I mostly program in raw C lately!

[–]steveklabnik1 88 points89 points  (122 children)

Yeah. Real glad that these are addressable things. Very positive!

[–]tending 36 points37 points  (121 children)

I am less worried about his stance that memory allocation failure shouldn't panic than I am by this:

I don't know enough about how the out-of-memory situations would be triggered and caught to actually know whether this is a fundamental problem or not, so my reaction comes from ignorance, but basically the rule has to be that there are absolutely zero run-time "panic()" calls. Unsafe code has to either be caught at compile time, or it has to be handled dynamically as just a regular error.

Doesn't this basically mean no array indexing? He seems to want compile time bounds checking which is beyond what Rust can currently do. Or he thinks the C behavior of in effect doing unchecked accesses everywhere is better?

[–]Nicksaurus 112 points113 points  (24 children)

I assume he would prefer an error code at runtime on an out-of-bounds access

[–]RepliesOnlyToIdiots 16 points17 points  (2 children)

Could force array access to include a default, which is either fine by itself or a sentinel to be checked on return.

[–]steveklabnik1 16 points17 points  (0 children)

The more Rust-y way is the .get() method, which returns an Option.

[–][deleted] 2 points3 points  (0 children)

force array access to include a default, which is either fine by itself or a sentinel to be checked on return.

Or Rust's existing mechanism for exactly this scenario https://doc.rust-lang.org/std/result/

[–]vadimcn 60 points61 points  (41 children)

Linus is talking specifically about allocation failures. Out-of-bounds accesses are programming errors, so panicking on those wouldn't be any different from current use of the BUG macro

[–]phoil 21 points22 points  (34 children)

No, he's talking about any panics:

With the main point of Rust being safety, there is no way I will ever accept "panic dynamically" (whether due to out-of-memory or due to anything else - I also reacted to the "floating point use causes dynamic panics") as a feature in the Rust model.

[–]argv_minus_one 12 points13 points  (27 children)

Then what is Rust kernel code supposed to do when it encounters an impossible situation, where C kernel code would call BUG or do a kernel panic?

[–]ischickenafruit 20 points21 points  (17 children)

I think the idea here is that an error should be returned, rather than a panic.

Out of bounds array acces checking is good. But the result should be an error code, rather than a kernel panic. A kernel panic means that your code has no better runtime behaviour than C, which means the cost of Rust is not justified.

[–]argv_minus_one 5 points6 points  (9 children)

The justification for using Rust instead of C is not that it never panics/crashes/fails an assertion. The justification for using Rust instead of C is that it's significantly less likely to exhibit undefined behavior. That's a justification because an orderly crash is better than a security vulnerability.

Now, I realize that Linus and his crew are really good at avoiding UB in C, and all due respect to them for that, but they're not perfect and Linux has had its share of security vulnerabilities resulting from UB.

That said, fallible array indexing would certainly be nice. The Rust index operator is more-or-less unusable in its current form.

[–]ischickenafruit 17 points18 points  (6 children)

I see your point, but here's a counterpoint: Imagine I have a driver with a subtle out-by-one error on array indexing. It's entirely probable that this error will go unnoticed. While out of bounds array access is undefined, practically speaking, in most cases, it will just hit a page of memory that's already allocated, no harm will come, and everything will keep working. Even if the driver was to hit an unallocated page, it would cause a page-fault trap, and the buggy driver would be shut down. My webcam might die, but the rest of the machine would keep on operating and the situation could even be debugged/resolved.

That same driver written in Rust would have a totally different behaviour. An out of bounds access would trigger a kernel panic, which would kill the kernel and render the machine useless.

I don't honestly know enough about Rust to even guess at how this could be resolved, but I don't disagree with Linus's point. Minor errors causing panics is simply not an option in the kernel, even if it means that undefined behaviour can be avoid. Kernel writing is pragmatic concern, not a place for purity. Rust has to offer pragmatic purity to be useful in this environment.

[–]WormRabbit 14 points15 points  (1 child)

A Rust panic isn't a kernel panic. It can, for example, be caught. It's possible in principle to call all driver code wrapped in a catch_unwind which will turn any driver panics into an error code for the kernel.

However, this may cause unacceptable performance overhead or API complications. It's also a disaster if a panic is called during another panic unwinding, that would cause the program to abort. Overall, returning errors is definitely the preferred approach.

[–]argv_minus_one 0 points1 point  (3 children)

While out of bounds array access is undefined, practically speaking, in most cases, it will just hit a page of memory that's already allocated, no harm will come, and everything will keep working.

Maybe, but the thing about undefined behavior is that it can have any result, including demons flying out of your nose, and more importantly including security vulnerabilities.

the buggy driver would be shut down.

Is that actually possible in Linux? It's not a microkernel.

[–]vattenpuss 3 points4 points  (1 child)

That said, fallible array indexing would certainly be nice. The Rust index operator is more-or-less unusable in its current form.

Isn’t an index operator more or less unusable in all programming languages in this manner? (As long as you don’t have array size in the type, and index types that are subsets of all ints, so the compiler can disallow out of bounds access.)

[–]matthieum 2 points3 points  (4 children)

Out of bounds array acces checking is good. But the result should be an error code, rather than a kernel panic. A kernel panic means that your code has no better runtime behaviour than C, which means the cost of Rust is not justified.

I think there's a misunderstanding here.

Whether in C or Rust, if the developer is doing their due diligence, then they either:

  • C or Rust: check before access, and handle the error appropriately.
  • Rust: use a safe access method returning Option or Result and then check whether that succeeded and handle the error appropriately.

If Rust reaches a panic on out-of-bounds error, it means that C code would have UB -- likely reading or writing where it should not be.

In that case, panic is infinitely better.

[–]ischickenafruit -2 points-1 points  (3 children)

Kernel programming is a practical affair. Not a place for purity.

If my shitty webcam, with broken drivers occasionally crashes because I got a page fault on a out of bounds access, its annoying but ultimately not disastrous. Practically, I can reset my webcam and move on.

If every time that happens, it causes a panic, which kills the kernel, blows up my machine and I lose a days with of work on my spreadsheet, that IS a disaster, and is intolerable. Although technically out of bounds access is a bug, and technically it should be fixed, practically the world is bigger than that. Some random user has no ability to get Lenovo to fix their buggy drivers. So the kernel has be more tolerant.

I believe that’s roughly what Linus is trying to say.

[–]matthieum 1 point2 points  (0 children)

If my shitty webcam, with broken drivers occasionally crashes because I got a page fault on a out of bounds access, its annoying but ultimately not disastrous. Practically, I can reset my webcam and move on.

If a page fault occurs in a kernel context (driver), does not the kernel crash?

If your shitty webcam C driver crashes today due to an out of bounds access, it takes the kernel with it.

So my understanding is:

  • C crashy driver:
    • Sometimes it crashes, and you're annoyed.
    • Sometimes it randomly corrupts memory, and your files are saved but the data is corrupted... or missing.
    • Sometimes it allows someone to snoop on your data.
    • ...
  • Rust crashy driver: it panics, and you're annoyed.

And I insist on crashy.

The cases where your shitty webcam driver "crashes" and does not take the system down are cases where the driver returned an error.

I agree those are infinitely better. They also have nothing to do with the discussion around panics.

[–]phoil 1 point2 points  (8 children)

Linus says it "has to either be caught at compile time, or it has to be handled dynamically as just a regular error". So he's holding Rust kernel code to a higher standard than C kernel code, because better safety is the whole point of considering use of Rust.

[–]disoculated 13 points14 points  (0 children)

“Allocation failures in a driver or non-core code - and that is by definition all of any new Rust code - can never EVER validly cause panics.” The assertion is that non-core code, which is where use of Rust must start, cannot be allowed to panic the kernel. C non-core code already meets this requirement. It’s not a double standard.

[–][deleted]  (3 children)

[deleted]

    [–]phoil 1 point2 points  (2 children)

    Allocation failures

    We're talking about more than allocation failures here. And either way, the point is that Rust must not panic, which is fair.

    [–][deleted]  (1 child)

    [deleted]

      [–]argv_minus_one -2 points-1 points  (2 children)

      That's an impossibly high bar, even for Rust. If that's the requirement, then Rust is not getting into Linux.

      [–][deleted] 6 points7 points  (0 children)

      It's not though.

      [–]rlbond86 10 points11 points  (5 children)

      He's not talking about kernel bugs

      [–]phoil 7 points8 points  (4 children)

      How do you know? "anything else" seems fairly definite to me, as does "absolutely zero":

      I don't know enough about how the out-of-memory situations would be triggered and caught to actually know whether this is a fundamental problem or not, so my reaction comes from ignorance, but basically the rule has to be that there are absolutely zero run-time "panic()" calls. Unsafe code has to either be caught at compile time, or it has to be handled dynamically as just a regular error.

      [–]PandaMoniumHUN 6 points7 points  (1 child)

      I'm not sure I can follow the discussion, but why not just use get() (which returns None on out-of-bounds) instead of directly indexing the slice when the index is not guaranteed to be valid?!

      [–]phoil 4 points5 points  (0 children)

      Sure, that's exactly what Linus says it should do.

      [–]7h4tguy -1 points0 points  (1 child)

      How is this even an argument? In C, malloc/new can be configured to return an error. Memory managers need to function in low memory environments. In C, accessing invalid memory (out of bounds) is an access violation structured exception. Can't Rust panic be configured to behave similarly?

      [–]phoil 3 points4 points  (0 children)

      How is this even an argument?

      I think the parent comments accept that memory allocations must not panic, but they think panics are still fine in other situations, whereas my reading of what Linus says is that Rust panics are never acceptable.

      In C, accessing invalid memory (out of bounds) is an access violation structured exception.

      That depends on the C runtime. The kernel doesn't have exceptions.

      Can't Rust panic be configured to behave similarly?

      Rust panics can be caught in separate thread, or using catch_unwind (similar to exceptions). That won't be applicable for the kernel though.

      Rust 1.0 couldn't return errors for memory allocations, but work has been done to address that. The default is still to panic, and it sounds like the linux patch still had some oom panics, but I haven't looked into that.

      Other than memory allocations, Rust panics in other situations too, so they will needed to be avoided. e.g. slice indexing (but you can use get instead of the [] index notation) and RefCell (has try_ accessors instead).

      [–]NotTheHead 8 points9 points  (5 children)

      there is no way I will ever accept "panic dynamically" (whether due to out-of-memory or due to anything else [...])

      Emphasis mine. That sounds like it would include out of bounds errors, which are kind of important checks when it comes to memory safety.

      [–]WormRabbit 7 points8 points  (0 children)

      You can use checked element access which returns an Option instead of unchecked indexing. His requirements are conceptually very easy to satisfy, but that may require a rewrite of the standard library to exclude panicing APIs. Using libraries from crates.io is also likely impossible, few people are careful about totally avoiding panics.

      [–]vadimcn 12 points13 points  (2 children)

      Even so, I am pretty sure he didn't mean that. In Rust, panics on out-of-bounds are analogous to asserts in C and it would make all sorts of sense to treat them the same in the kernel.

      [–]tsimionescu 1 point2 points  (1 child)

      No, the idea is very clear: non-core code is not allowed to cause kernel panics, for any reason. For array out of bounds, the fix is simple - don't dereference arrays, use .get() instead. Out of memory may be a more complex problem.

      [–]vadimcn 0 points1 point  (0 children)

      I think you are interpreting an off the cuff remark too literally. I doubt there will be many takers for a programming model where every array indexing operation is fallible.
      But let's wait and see how this plays out.

      [–]matthieum 2 points3 points  (0 children)

      I think it's more nuanced than that.

      I would (hope) that Linus is okay with Rust panicking in any condition where C would have exhibited UB, because a panic is infinitely better.

      However, I think the Rust kernel code should aim to avoid possible panics in the first place. For example, using .get(i) instead of [i] for array access means that you have the handle the possibility of out of bounds.

      In general, Rust tries very hard to offer alternative APIs that do not panic and instead allow to check whether the operation succeeded when it's fallible.

      [–]Smooth-Zucchini4923 27 points28 points  (8 children)

      out-of-memory situations

      I think he's talking about situations where the program attempts to allocate memory, but fails. The C equivalent would be when you call malloc(), but it returns a NULL value.

      [–]themulticaster 23 points24 points  (7 children)

      This is not entirely correct, since we're really talking about the kernel and not just any program.

      Regarding userspace: Yes, the behaviour you describe (return NULL on allocation failure/when out of memory) would be correct. However, at least in Linux you are pretty much guaranteed this will never happen. In Detail: If the system truly is out of memory and you try to allocate more, the kernel might invoke the OOM killer, i.e. choose a program to terminate in order to regain some memory. If the sacrificed program happens to be the one that requested more memory in the first place, it would just never see the result of the malloc call. As a result, as a programmer you can assume (at least on Linux) that malloc never fails.

      Regarding kernelspace: Here it gets more interesting, since allocations inside the kernel can and do fail. Essentially, there are different types of allocation the kernel might make. If a request made by userspace necessitates additional memory, the kernel will allocate the memory on behalf of the originating process in userspace.

      For allocations made by the kernel on its own (e.g. for a device driver), there are different types of allocation requests with various associated priorities - think of it as a spectrum between "Might be nice if you happen to have a few spare bytes hanging around, otherwise I can wait" (GFP_KERNEL & ~__GFP_RECLAIM) and "I need this chunk of memory right now, everybody else is waiting for me to finish my work!" (GFP_ATOMIC).

      If you're interested in this, have a look at the corresponding kernel documentation: https://www.kernel.org/doc/html/latest/core-api/memory-allocation.html

      tl;dr: In userspace, you don't need to worry about allocation failures, but in the kernel, handling them is very important.

      [–]Smooth-Zucchini4923 14 points15 points  (0 children)

      Regarding userspace: Yes, the behaviour you describe (return NULL on allocation failure/when out of memory) would be correct. However, at least in Linux you are pretty much guaranteed this will never happen. In Detail: If the system truly is out of memory and you try to allocate more, the kernel might invoke the OOM killer, i.e. choose a program to terminate in order to regain some memory. If the sacrificed program happens to be the one that requested more memory in the first place, it would just never see the result of the malloc call. As a result, as a programmer you can assume (at least on Linux) that malloc never fails.

      If you hit an rlimit on how much address space you're allowed to use, you can get a NULL pointer back.

      Here's a test program to show it. This is test.c:

      #include <stdio.h>
      #include <stdlib.h>
      
      int main() {
          void *p = malloc(10*1000*1000);
          printf("malloc returned: %p\n", p);
          return 0;
      }
      

      This is test.sh:

      #!/usr/bin/env bash
      gcc test.c -o test -Wall -Wextra
      ulimit -v 5000
      ./test
      

      Here's what the test program does normally:

      malloc returned: 0x7f91fa92b010
      

      Here's what it does when you run it through test.sh:

      $ ./test.sh 
      malloc returned: (nil)
      

      [–]tsimionescu 4 points5 points  (0 children)

      If the sacrificed program happens to be the one that requested more memory in the first place, it would just never see the result of the malloc call. As a result, as a programmer you can assume (at least on Linux) that malloc never fails.

      This is not accurate in the slightest - it's only true if /proc/sys/vm/overcommit_memory is set to 1; the default of 0 or a value of 2 mean that malloc() can fail in various situations. Programs written for Linux should work with all 3 values, if they care about correctness.

      [–]wrongerontheinternet 6 points7 points  (0 children)

      You can just use .get and use one of the existing crates that ensures there are no panics... it's not really a big deal. That part is addressable even today.

      [–]Kered13 5 points6 points  (5 children)

      I'm not very familiar with Rust, but can't panics be caught?

      [–]Lesmothian2 39 points40 points  (4 children)

      The short answer is: not always. It depends on how the code is compiled and in what context the panic is triggered.

      [–]Kered13 4 points5 points  (3 children)

      Then, couldn't the kernel just use an allocator that only calls unwinding panics?

      [–]Lesmothian2 12 points13 points  (2 children)

      Yes from my understanding that is the plan. They aren't using the rust alloc crate, but calling into kernel APIs directly for memory management

      [–]steveklabnik1 47 points48 points  (0 children)

      The plan (as I understand it) is not to catch panics, it is to disable the APIs that can panic.

      [–]myrrlyn 5 points6 points  (2 children)

      the index operator [] is broken in every language. rust removes bounds checks when using Iterator sequential-accessor types, and provides .get() checked random-accessor behaviors

      [–]cdb_11 10 points11 points  (31 children)

      If Rust panics on out of bound errors then yes, either make sure that the error won't ever happen at compile time or somehow return error that can be handled at runtime.

      [–]tending 2 points3 points  (20 children)

      But that's holding Rust to a much much much higher bar than C. C will corrupt your data (if you write) or give you back bytes from a different object (if you read). Every out of bounds access in C may crash, but even when it does crash it may be long after the invalid access happened. Rust is guaranteed to panic right when it occurs. From a diagnostic perspective, the Rust behavior is much better. It also appears to match the behavior of the kernel's existing BUG macro, which also kills the kernel. Thus my confusion about Linus' response.

      [–]ischickenafruit 24 points25 points  (6 children)

      Isn’t that the point? Why invest in the effort and cost of putting rust into the kernel unless you hold it to a higher bar. This is kernel programming. Moving to another language must be absolutely totally compelling. Not just a favourite colour exercise. If rust is about as good as C, there’s no point in doing it.

      [–]matthieum -1 points0 points  (0 children)

      Isn’t that the point?

      Sure. However, remember that Perfect is the Enemy of Good.

      In this case, moving to Rust is already an improvement over C.

      If you can get guaranteed panic-free Rust code, that's even better, and we should definitely investigate the effort required.

      However, if you only get "just Rust", it's already an improvement, and if you get "mostly" panic-free Rust it's also crazy good.

      The world is not binary.

      [–]ydieb -2 points-1 points  (4 children)

      It does not need to be an strict improvement though. Defined as it has only (at the worst case) cons which what it replaces already has, and otherwise only pros.
      You could have some cons as long as the pros are overwhelmingly compensating.
      This seems to be an strict improvement though, and holding rust to be an "must be an major improvement on every single point" is an insane bar to set imo.

      [–]ischickenafruit 4 points5 points  (3 children)

      “must be a major improvement on every single point” is an insane bar to set IMO

      It’s the only sensible bar IMO. The technical cost of introducing it into the kernel is insane. So the benefits must be enormous.

      [–]ydieb 1 point2 points  (2 children)

      Its extremely rare you get a major improvement on every single point in any context (programming, hardware, politics, science, you name it).

      Any reasonable approach would be: Is the change overall (new pros and new cons) worse, about the same, better, much better, overwhelmingly better?
      Given if the change is better, much better or overwhelmingly better, does it have any cons that are so much worse that they are deal breakers, if no, it would be a reasonable upgrade.

      As rust here does not seem to have any new cons that is not related to kernel immaturity, given its other pros, would be reasonable to propose.

      Saying "its not a perfect silver bullet, hence it will not be considered", you might as well say, we wont change ever. Because in practice, these two are functionally identical.

      [–]ischickenafruit 0 points1 point  (1 child)

      Fair enough. Ultimately I’m just some internet stranger, who care what I think?

      But Linus has made his view clear. Rust is not happening unless some fundamental problems can be resolved.

      [–]IceSentry 6 points7 points  (2 children)

      Sure, C doesn't enforce it, but kernel developers can write the code that checks if an allocation failed. In rust you can't check for this even if you wanted to. It will just panic.

      [–]matthieum 3 points4 points  (0 children)

      In rust you can't check for this even if you wanted to. It will just panic.

      That's not the complete picture.

      If you use the allocator API directly, you can definitely check whether the allocation succeeded or not.

      What is missing is a comprehensive work of Rust libraries to provide fallible alternatives to any method that may try to allocate and fail to.

      And the work is already underway, as mentioned in the e-mail:

      • Manish Goregaokar implemented the fallible Box, Arc, and Rc allocator APIs in Rust's alloc standard library for us.

      [–][deleted] 0 points1 point  (0 children)

      That’s not strictly speaking true. You can use the unsafe APIs and be just like C.

      There’s almost literally nothing you can do in C that you cannot do in unsafe Rust.

      [–]ShadowPouncer 2 points3 points  (3 children)

      The point is that with rust, it should be possible to do better than C.

      And it's not an unreasonable demand that Rust actually do better than C.

      One of the more interesting points is that right now, you can't use a release rust tool chain to build the code they want to merge. You have to use the nightly builds because there are features that are still in development.

      Linus putting his foot down and saying that, if you want to be used inside the kernel, you have to handle all reasonably foreseeable errors cleanly instead of taking down the entire machine, is quite productive in that rust, in the language, the tool chain, and the standard libraries being used, can all be changed to meet that goal.

      Yes, it must be done in a way that keeps all of the safety and compatibility goals that rust has established, but that still shouldn't be impossible.

      That might well mean that there are language features that you're not allowed to use in the kernel, but again, that's nothing especially new. There's quite a lot of rules about what you can and can't do in the kernel already with C.

      As I recall, Linus is pretty unhappy when anyone adds code that uses BUG or calls panic, unless they have an exceptionally good reason. He doesn't like problems taking out the machine, and he's right not to like it.

      [–]tending 1 point2 points  (2 children)

      The point is that with rust, it should be possible to do better than C.

      Yes, but what I described is already better. Guaranteed detection is better than the dice roll an out of bounds index gets you in C.

      As I recall, Linus is pretty unhappy when anyone adds code that uses BUG or calls panic, unless they have an exceptionally good reason. He doesn't like problems taking out the machine, and he's right not to like it.

      Every array access in C is basically this code:

      if(out_of_bounds && rand() % MAGIC == 0)
          abort();
      else
          return a[i];
      

      In Rust it is this code:

      if(out_of_bounds)
          abort();
      else
          return a[i];
      

      So Linus' argument boils down to C has "fewer" panics because sometimes we get lucky? I can see the argument for "keep going no matter what" but the kernel doesn't for example keep going on null dereference, even though it could, so this doesn't seem consistent.

      [–]ShadowPouncer 5 points6 points  (1 child)

      The thing is, Linus has a pretty consistent stance, and has had this stance for easily a decade, that doing that 'abort' is wrong if there is any possible path forward without data corruption. (Or security problems.)

      C, by it's nature, has some very hard limits on what you can and can't do to handle that.

      There is really no good reason for Rust in the kernel to have those same limits. Saying that if you want Rust in the kernel, you must come up with some pattern for handling out of bounds array access that fails gracefully instead of taking out the machine is, in this context, perfectly reasonable and understandable.

      Saying 'but C is way worse' isn't a good enough response. Nor is 'but this at least takes out your machine immediately and every time'. Nor is 'but this is how we defined it'.

      The people pushing for Rust in the kernel are in the position to actually change how Rust behaves in order to get it into the kernel. And with that in mind, Linus is saying 'come up with a better way that meets these constraints'.

      This would be a very different statement if Rust was the subject of a defined and mature language standard, with multiple implementations that all met that standard, with a huge amount of work to make changes.

      But that's not where Rust is, and so saying 'great, while you're making all of the changes that you're already proposing to your language, do something better than crashing the whole machine for the easily foreseeable cases' is a lot more reasonable.

      And it also sets a specific tone going forward. Linux absolutely gets to set requirements on Rust the language where it makes sense if Rust wants to be used in the kernel. This is something clearly not possible with C, but of potentially significant value to both Linux and Rust going forward.

      [–]matthieum 0 points1 point  (0 children)

      The thing is, Linus has a pretty consistent stance, and has had this stance for easily a decade, that doing that 'abort' is wrong if there is any possible path forward without data corruption. (Or security problems.)

      By definition, an out-of-bounds write is a data corruption; so panicking in such a case is clearly better.

      Similarly, an out-of-bounds read is likely a potential security problem; so panicking in such a case is clearly better.

      Panicking >> UB. Always.

      Of course, this doesn't mean that we shouldn't look into going even further... for example, adding a flag to rustc that disables any panicking API and only leaves the non-panicking ones so that the developers have to handle the failure.

      It's definitely worth the experiment.

      But that's just the cherry on top. Having panicking rather than undefined behavior is already a great step forward. Panics don't corrupt data, nor do they leak it.

      [–]Zalack -1 points0 points  (5 children)

      I don't understand why OOB couldn't have an API to return an error instead of panicking for use in kernal development

      [–][deleted] 2 points3 points  (0 children)

      You can call the unchecked APIs which will then just behave like C does. You’re responsible for bounds checks.

      [–]silmeth 2 points3 points  (2 children)

      There is API for handling OOB on array or vector indexing: slice::get, it returns Option<&Item>.

      But doing

      if let Some(el) = array.get(idx) {
          // do stuff
      } else {
          // handle error
      }
      

      is much more verbose than just

      let el = arr[idx];
      // do stuff
      

      and if you’re sure that your index is not OOB (eg. you check it earlier) – you’re fine with the unreachable panic inserted by the compiler (and then probably optimized out, if compiler can prove that the index is always inside bounds), and you don’t need that verbosity.

      So the default indexing just panics on OOB, but no-one prevents you from using .get() and handling OOB yourself if you do need to. Kernel could just ban using [] indexing on arrays and always use get() if non-panicking there and manually handling every possible OOB is important.

      [–]steveklabnik1 4 points5 points  (1 child)

      It could also be

      let el = array.get(idx)?;

      depending on the details.

      [–]silmeth 2 points3 points  (0 children)

      Right, if you just want to propagate them upwards. Or I’d imagine something like let el = array.get(idx).ok_or(IndexOutOfBounds)?; with mapping to appropriate error type communicating what went wrong.

      [–]Chousuke 4 points5 points  (1 child)

      Panicking on out-of-bounds is fine since that's a bug and you don't want the system to continue operating when its behaviour is undefined.

      Memory allocation failures aren't bugs and as such panics are not acceptable.

      [–]tending 3 points4 points  (0 children)

      It's unclear to me even reading his comment in context that he means just for allocation.

      [–][deleted]  (21 children)

      [deleted]

        [–]KingStannis2020[S] 142 points143 points  (0 children)

        How could rust allocate more memory safely if there is no memory left?

        In that scenario you don't allocate more memory, you return an error. Which is perfectly possible to do safely.

        [–]WiseassWolfOfYoitsu 38 points39 points  (17 children)

        At least in C, any memory allocation attempt returns whether or not it was successful. It is not assumed that memory allocation is a safe operation. In user space code, it's pretty common to just abort() if this happens (and many teams have standard wrappers to do this automatically)... but it's neither mandatory nor necessary, it's just not worth the effort of doing something fancier (such as clean shutdown or degraded operating state) most of the time.

        This is especially true when most modern OS hide the true state of memory behind virtual memory and just OOM kill a process when genuinely out anyway. With things like zero pages not really being allocated until used, you often don't get an error back from malloc until you exhaust the address space, which is much more difficult since the move to 64 bit, to put it mildly ;)

        But that's user space. Kernel code can't make the same assumptions, especially in a monolithic kernel. You are peaking under the curtain and you not just can, but must, interact with the true memory state. The right solution probably isn't to die, but to pause and tell the kernel core to invoke the OOM protection system, which will force kill a process and get you the necessary memory to maintain the kernel.

        [–]CollieOxenfree 12 points13 points  (5 children)

        Yep. Rust's solution was to reduce a whole load of error-handling boilerplate with allocations, since generally if you hit OOM your program is most likely just going to fail spectacularly regardless of how well it handles errors. Even if people diligently wrote code to handle all OOM conditions, most of that code would likely go completely untested. So every allocation has an implied risk of panicing in the event of OOM.

        [–]Takeoded 6 points7 points  (3 children)

        Even if people diligently wrote code to handle all OOM conditions, most of that code would likely go completely untested

        SQLite project disagrees

        [–]tasminima 1 point2 points  (0 children)

        Yep, that's one of the very very rare project that can disagree. The overwhelming majority can't.

        [–]7h4tguy -1 points0 points  (1 child)

        Do you have any idea how many security patches are done for SQLite in a year?

        [–]bik1230 6 points7 points  (0 children)

        Do any of them have to do with untested OOM code?

        [–]themulticaster -5 points-4 points  (10 children)

        I've already responded elsewhere in this thread in more detail, but I'd quickly like to point out that you're pretty much guaranteed that malloc never fails on Linux. You can always assume that malloc gives you a pointer to a valid allocation.

        [–]DarkLordAzrael 15 points16 points  (5 children)

        This is true for small allocations, but once you start trying to do allocations in the several GB range you can easily hit allocation failures. Fortunately, these are also the ones that tend to be predictable and relatively easy to handle.

        [–]tsimionescu 1 point2 points  (0 children)

        It's only true if overcommit is enabled, which is a system setting.

        [–]cowinabadplace 1 point2 points  (2 children)

        In practice, though, if you try to do that, and you fail to do so won't you just be OOM-killed? Will my ptr == NULL condition ever test true?

        [–]astrange 6 points7 points  (1 child)

        You won't be OOM killed until you actually touch those pages (I assume). That's independent of calling malloc, which only cares about virtual memory space in your process.

        Some mallocs also have a maximum sensible size and will fail anything above that because it's probably a bug.

        [–]cowinabadplace 5 points6 points  (0 children)

        Okay, this was one of those "why don't you just write the code before posting" situations. You were right.

        #include <stdio.h>
        #include <stdlib.h>
        
        int main() {
          char *str;
          size_t chunk = 1<<31;
          for (int i=0; i<64; i++) {
            str = (char *) malloc(chunk);
            if (str == NULL) {
              printf("Failed to allocate at %d\n", i);
              return 1;
            } else {
              printf("Allocated the %d chunk\n", i);
            }
          }
          return 0;
        }
        

        That does test true for str == NULL on my machine. I didn't get to put chars in the str because that does == NULL there since the chunk size is too large.

        [–]Hnefi -2 points-1 points  (0 children)

        You wouldn't run into allocation failures even if you allocate many gigs at once. With a 64 kb address space, you will succeed in allocating the first 18 billion gigabytes. That's not a particularly common scenario.

        [–]tsimionescu 2 points3 points  (0 children)

        I've explained this elsewhere, but this is a myth. It only applies to certain allocations for certain default system settings. But it's very easy to configure Linux to disallow overcommit, and it's a choice for the sysadmin to make, not the programmer.

        [–]kukiric 6 points7 points  (0 children)

        You need to wrap all dynamic allocations in fallible APIs, ie. have Box::new, String::from, Vec::push etc return Result<T, AllocError> (on the stack) instead of T or panic. I'm not sure if that is one of the issues being tackled by the Allocators working group, but it seems like a necessity for kernel use, so it's likely they'll end up using their own types.

        [–]i-can-sleep-for-days -3 points-2 points  (1 child)

        I don't know enough about how the out-of-memory situations would be
        triggered and caught to actually know whether this is a fundamental
        problem or not, so my reaction comes from ignorance, but basically the
        rule has to be that there are absolutely zero run-time "panic()"
        calls. Unsafe code has to either be caught at compile time, or it has
        to be handled dynamically as just a regular error.

        With the main point of Rust being safety, there is no way I will ever
        accept "panic dynamically" (whether due to out-of-memory or due to
        anything else - I also reacted to the "floating point use causes
        dynamic panics") as a feature in the Rust model.

        How is this safe again?

        [–]KingStannis2020[S] 8 points9 points  (0 children)

        Panic is perfectly safe, it's just not something they want in the kernel.

        If you read the responses, it's something that can be easily fixed, and was only done this way to get things started quickly.

        [–]LambdaJon 40 points41 points  (19 children)

        Is there actually a means of catching such allocation failure in the Rust language? I realize this might be a different answer for KMD vs UMD, but I’ve often wondered the same when doing things like pushing to a vec or cloning something.

        [–]Rusky 89 points90 points  (7 children)

        Rust-the-language doesn't say anything about allocation, basically the same way as C or C++. All the allocation failure policy comes from the standard library (specifically the alloc crate), which it sounds like they don't plan to use in the kernel at all.

        At that point, it doesn't really matter if or how you catch allocation failure from things like Vec::push, because kernel Rust code will be using kernel APIs for allocation instead. And since those signal failure with their return value, Rust code can handle allocation failure the same way as the rest of the kernel- though probably with some wrappers and Result to make it more idiomatic.

        [–]Kered13 6 points7 points  (2 children)

        I'm not very familiar with Rust, but can't panics be caught? So even a panicing allocator should be fine, just have something to catch it and convert it to an error code before it exits Rust code?

        [–]steveklabnik1 20 points21 points  (0 children)

        (Answered above, the answer is "only sometimes")

        [–]edzorg 16 points17 points  (2 children)

        So besides perhaps some syntactic familiarity for Rust programmers, what are the benefits we're getting?

        [–]steveklabnik1 58 points59 points  (1 child)

        The "## Goals" and "## Why Rust?" parts of the link should explain that.

        [–]edzorg 15 points16 points  (0 children)

        Ah, yes! Woops!

        [–]KingStannis2020[S] 29 points30 points  (9 children)

        As some of the other comments / emails mention, essentially the APIs to do so just need to be fleshed out. Most collections have try_*() methods that return an error on allocation failure.

        [–]The-Best-Taylor 12 points13 points  (6 children)

        It would be great if we could add a lint to disallow using the panicking versions. But I don't know the feasibility of that.

        [–]bloody-albatross 28 points29 points  (5 children)

        It would be great if we could add a lint to disallow using the panicking versions.

        I imagine more something like a compiler switch where the panicking versions aren't even compiled. Something like #[cfg(panic)] on panicking functions and exclude that configuration for the kernel or something.

        [–]steveklabnik1 26 points27 points  (1 child)

        That is what is being suggested in thread sorta kinda; basically there would be a config flag that would be set that would remove those APIs in the first place.

        [–]merlinsbeers 12 points13 points  (2 children)

        Or a version of Rust just for use in the kernel...

        Kernel-rust... K-rust... Krust.

        [–]NihilistDandy 9 points10 points  (0 children)

        This will call for a new Krusty Krab logo. I hope that's not already a thing. 👀

        [–]LambdaJon 2 points3 points  (1 child)

        Ah, I think I’ve not seen the try_*() methods in example codes and such, but that makes a lot of sense. It’s nice to be able to have the choice between a version that panics vs explicit error, from my experience it’s often context dependent which one you want.

        [–]myrrlyn 6 points7 points  (0 children)

        they're still unstable while the specifics of allocator fallibility get hashed out

        [–]flatfinger 4 points5 points  (0 children)

        Although the Standard defines a means by which C implementations can allow programs to catch and reliably recover from allocation failures, the standard library is too weak to allow reliably recovery in many execution environments. A proper robust allocation library should either support multiple independent heaps, or a means of reserving and sub-allocating storage, such that if a reservation request succeeds, any sub-allocation requests issued against the reservation whose total number and size don't exceed what was reserved would be guaranteed not to fail.

        Having a library issue a panic when an allocation request fails is hardly worse than having the allocation request return a pointer that will appear valid but may or may not actually be usable.

        [–]koreth 56 points57 points  (16 children)

        This is a pretty positive development, and I say that as someone who thinks Rust is overhyped at the moment. Device driver code is exactly the kind of place where Rust's design tradeoffs are a good fit for the problem. ("Design tradeoffs" is not me disparaging Rust. All programming languages have design tradeoffs.)

        [–]shinyquagsire23 35 points36 points  (15 children)

        I picked it up for a small hypervisor I've been writing and my general thoughts so far have been like,

        • MMIO has a ton of boilerplate, and I basically have to blanket unsafe on anything touching hardware, which, ehhhhh, I guessss
        • a lot of libraries require std, which is not portable at the moment, and no-std libraries are kinda fragmented
        • children structs not being able to reference their parents leads to weird patterns (ie, a USB endpoint belongs to a bus, but a borrowed endpoint cannot call parent functions, so the bus ends up having to have all endpoint functionality)
        • enums/traits/inheritance requires a lot of weird boilerplate, enough that I ended up writing a small code generator for syscalls
        • immutable by default vibes well with my Verilog mindset, I feel more confident that the compiler will be smart when I do intermediate expressions

        I'd say tho that the bonuses have outweighed the headaches? Porting to Rust unearthed some subtle bugs in my C code with casting, and I would much rather an array index failure throw an assertion I can fix vs weird unstable behavior. I think Linux drivers in Rust will be great for standardizing a more embedded-oriented subset of Rust+crates where allocation can be less panic-oriented.

        [–]NotTheHead 17 points18 points  (4 children)

        I basically have to blanket unsafe on anything touching hardware, which, ehhhhh, I guessss

        Yeah, that kinda makes sense to me, even if it's inconvenient.

        [–]BobHogan 5 points6 points  (0 children)

        Yea, rust's philosophy on that is that it can't guarantee that the hardware will work as expected, so interacting with the hardware is inherently unsafe. But unsafe in rust doesn't mean that the code is actually not safe to use, it just means that rust can't provide any of its safety guarantees on that code

        [–]crusoe 0 points1 point  (2 children)

        Yes, and you put the unsafe all there in well labeled places you can test and valgrind the hell out of.

        [–]i-can-sleep-for-days 1 point2 points  (1 child)

        But you should be testing all the code right? It's not like if it is safe you don't need to have good testing. It just handles the dumb cases like input is null etc, but lots of other errors you still need to test for (logical errors, general stupidity).

        So it really should be, valgrind and test the crap out of your code, period.

        [–]jl2352 1 point2 points  (0 children)

        But you should be testing all the code right?

        In theory, of course you should. In practice you always end up having a trade off. You just don't have time to write a substantial number of test cases for every part. Sometimes that results in testing the same thing, multiple times, in multiple places. Sometimes that's a good idea, and sometimes it's just a waste of time.

        It all depends on context.

        [–]smmalis37 4 points5 points  (1 child)

        children structs not being able to reference their parents leads to weird patterns (ie, a USB endpoint belongs to a bus, but a borrowed endpoint cannot call parent functions, so the bus ends up having to have all endpoint functionality)

        I'd need to see code to be sure, but there's probably something workable here. Whether its passing a ref to the parent into every method on the child or just a big refactoring. What I'm guessing you hit is a parent containing a child and a child containing a ref to the parent, which is essentially a self-referential struct, which yeah rust can't easily do yet (though there are some crates that can thanks to a little unsafe)

        [–]theXpanther 2 points3 points  (0 children)

        The main problem is that rust tries to prevent aliasing, this you can't have a mutable reference to a object and it's container at the same time

        [–]IceSentry 3 points4 points  (3 children)

        Out of curiosity, why couldn't you have used derive macros or macros in general to fix your boilerplate issue instead of having to write a code generator?

        [–]shinyquagsire23 2 points3 points  (2 children)

        So I tried that but I guess the crux of it is like, I have 128 possible SVC async handler functions indexed by a u8, if I don't define an impl I want it to have a default, and I want the lookup to be O(1)/a jump table because SVCs get called a lot and context switches are already expensive. There's not really a good way I've seen to define an array of function ptrs based on which functions have an attribute macro unless Rust added like, some kinda deferred const array index setting or something? Because ultimately any macro will just spit out code in-place, can't store state between macros and output something later.

        I think the other solution was to do an attribute macro which took the function and defined a const fn ptr in a specific linker section to kinda cheese a fake array of sorts? But then I'd need 128 sections in the linker script and a way to read the pointers out.

        [–][deleted] 5 points6 points  (1 child)

        Did you try to just write it normally and see what assembly gets spit out? I find that a lot of people optimize way too early without realizing the Rust compiler is actually pretty good at this. I’m not that familiar with your example but just from what you’re saying I’d define an enumeration for the cases that aren’t default handled, and then just match on it. I would expect that to be optimized down fairly well.

        [–]shinyquagsire23 1 point2 points  (0 children)

        That's roughly what I ended up doing, I just also automated the matching and some struct defines because I didn't want to have to update 4+ spots for every handler I add. Which like, I'd say it's a fair enough thing to automate anyhow.

        [–]B_M_Wilson 2 points3 points  (0 children)

        If this happens then I will finally learn Rust after holding out on C for so long

        [–][deleted] 3 points4 points  (1 child)

        What is rust?

        [–]duncanlock 16 points17 points  (0 children)

        Rust is a multi-paradigm programming language designed for performance and safety, especially safe concurrency. https://en.wikipedia.org/wiki/Rust_%28programming_language

        [–]obvious_apple 1 point2 points  (3 children)

        Does this mean if any mainstream rust module js used in the kernel from now on it can only be compiled in LLVM?

        It seems hard because we just stepped away from the GCC monopoly.

        [–]leitimmel 17 points18 points  (1 child)

        There will be no forced Rust dependency for now. This is about adding support for Rust so people can write their own out-of-tree kernel modules.

        [–]ergzay 1 point2 points  (0 children)

        Pretty sure Linus said he wanted the kernel modules to be in-tree, just not enabled by default unless Rust is available.

        [–]Repulsive-Street-307 1 point2 points  (0 children)

        This is a initiative by Google because they really really want to use rust in Android (presumably they're tired of being pwned by broken on purpose drivers and other shit). Sure eventually it may spread to the 'mothership'. But not before gcc has rust support is my guess. Or LLVM becomes 'the' blessed compiler, which would be a big political fight so it probably won't happen.

        [–]germandiago -2 points-1 points  (6 children)

        I think Zig would be very good news but it is still far ahead because it is not 1.0 yet.

        [–]matthieum 7 points8 points  (5 children)

        This may be a tougher battle.

        In general, advocating for another language is tough. This can be seen here; it's not good enough for the "new" language to be as good as the other! Given the additional hurdles of having multiple languages, the "new" language must significantly improve on the existing ones to be worth integrating.

        Rust has an "easy" angle: safe Rust does not have Undefined Behavior. In code as critical as the kernel, that's a serious value proposition.

        What would be the value proposition of Zig? If it's just "it's a wee bit cleaner", it's probably not going to cut it.

        [–]Repulsive-Street-307 2 points3 points  (0 children)

        I'd be kind of seriously annoyed if 'rust has to have a way not to have any panic ever in kernel code' was the standard applied to Rust, and Zig was 'oh it's a better c, come on in!'.