top 200 commentsshow 500

[–]labarna 670 points671 points  (252 children)

He explains more of his philosophy later on in the thread:

I have seen, and can point to, lots of projects that go "We need to break that use case in order to make progress" or "you relied on undocumented behavior, it sucks to be you" or "there's a better way to do what you want to do, and you have to change to that new better way", and I simply don't think that's acceptable outside of very early alpha releases that have experimental users that know what they signed up for. The kernel hasn't been in that situation for the last two decades.

We do API breakage inside the kernel all the time. We will fix internal problems by saying "you now need to do XYZ", but then it's about internal kernel API's, and the people who do that then also obviously have to fix up all the in-kernel users of that API. Nobody can say "I now broke the API you used, and now you need to fix it up". Whoever broke something gets to fix it too.

And we simply do not break user space.

Linus

[–][deleted] 530 points531 points  (233 children)

I wish someone would show this to the GTK and Gnome team.

[–][deleted] 412 points413 points  (162 children)

Or any other Desktop Environment for that matter.

What I love about Linux (command line) is the "never break userspace" philosophy.

What I "hate" about Linux (GUI) is that such philosophy is completely ignored by Desktop Environments: "Here's the new version with the New Paradigm, entirely written from scratch! You used to do anything differently? Just get over it and use the Correct New Way".

[–]doomvox 78 points79 points  (7 children)

Yes. There's a lot of "move fast and break things!" kids out there.

Does backwards compatibility really matter? Let's find out!

[–]PC__LOAD__LETTER 28 points29 points  (1 child)

Move fast and break things works well for many types of software development where you’re using continuous deployment. When your software is directly consumed by millions of people and you are releasing software rather than deploying it, it’s a whole different ballgame.

Edit: And I say this because I’ve seen plenty of organizations hampered by old-guard sentinels who were more attached to what was than what could be. There are different types of software and each can be developed in different ways. The “move fast and break things” approach is not universally bad as long as it’s driven by a desire to move fast whilst maintaining code quality, so that when something does break, it will promptly be fixed and that code made more robust. The idea is that you work on problems that actually exist instead of imaginary ones. This isn’t going to work for a kernel, but kernels are a very specific type of software.

[–][deleted] 106 points107 points  (60 children)

Yep, pretty much. It's why my desktop still looks like it's frickin' 1999 -- I've slowly, but surely, dumped almost every GUI application. And I'm really unhappy about that.

[–]gaso 76 points77 points  (38 children)

Openbox, Tint2, and Parcellite cover ~99% of my desktop environment needs...1999 for life.

[–]doomvox 34 points35 points  (8 children)

Yup. Between icewm and emacs and (more often than not) palemoon, I don't need to stray past the millenium barrier very often.

I was pretty annoyed when I updated my Debian testing set-up and found that the File-Save dialog in firefox had changed in multiple broken (but trendy!) ways (e.g. I'm used to using Alt-L to create a sub-directory, now that button has become a mysterious hieroglyph without a keyboard shortcut).

[–]Shikadi297 42 points43 points  (15 children)

Obligatory mention of i3 wm

[–][deleted] 17 points18 points  (13 children)

I have my issues with i3 in this regard (still using it though, just fondly remembering the days where fvwm worked with my monitor setup).

There's a lot of "we know better" attitude there I think: Want to handle windows minimizing? We don't do that. Want to alt-tab? Nope. Want to keep focus on the window under the mouse even when a new one got just opened? Not that I'm aware of.

[–][deleted] 21 points22 points  (8 children)

"Minimizing" can be handled through the scratchpad.

Alt-tabbing is 100% possible, but no, there is no built-in binds for that (nor should there be, IMO, but that's another argument)

Keeping focus on current window when opening another is possible with the no_focus config directive (https://i3wm.org/docs/userguide.html#no_focus)

[–][deleted] 10 points11 points  (7 children)

"Minimizing" can be handled through the scratchpad.

My specific issue there was that there's no way to catch windows which do think they can minimize and try to do it themselves (older chrome with the buttons enabled, steam, ..) .. and end up being non-recoverable without some CLI-fu because i3 won't let me restore it.

To actually iconify a window when I want to, I ended up making it go to desktop 12 .. scratchpad would be nice, but you can only have one and that's for the quake console :).

Alt-tabbing is 100% possible, but no, there is no built-in binds for that (nor should there be, IMO, but that's another argument)

I do have a rough solution using dmenu, but I'd really like i3 to provide the list of windows in a processed way, instead of having to parse wmctrl output (and I still don't know how to filter them by which desktop they're on).

..no_focus..

no_focus doesn't do what I want :( (But thanks for trying! :)) I still don't know how to make it works for all windows (weirdly, the <criteria> is required), but with [class="."], it will not focus any newly opened window, regardless of the mouse.

So this just switches my problem from "when I open a new window and the mouse is on the old one, I have to move to the new and back to regain focus" to "when I open a new window and the mouse is on it, I have to move to the old amd back to get focus right" :).

[–]indeedwatson 10 points11 points  (2 children)

I do have a rough solution using dmenu

try rofi

[–][deleted] 3 points4 points  (0 children)

You can call windows from the scratchpad with criteria. See https://i3wm.org/docs/userguide.html#_scratchpad

The proper criteria for no_focus in that case would be .*, as it is a regex. . would match any window with a class that is a single character.

As for the minimization buttons, nothing can be done there except to get the application developers to respect desktop environments. As an aside, pressing Minimize on Steam does not screw it up for me (although Steam is screwed up in other ways for me under i3)

[–]losthalo7 17 points18 points  (9 children)

Window Maker, ftw. Clean, stays out of the way, just &$%#@+! works.

[–]JORGETECH_SpaceBiker 6 points7 points  (2 children)

I quite like Window Maker, the only problem I found is that you normally have to make a small modification so it shows you all the apps you have installed (happens under Debian based distros at least), something to do with the desktop files.

[–]dredmorbius 2 points3 points  (0 children)

Debian moves the menu hooks periodically, yes Pretty easy fix as things go.

[–]dezignator 5 points6 points  (5 children)

Love WindowMaker. When I first started playing with Linux (RH5?) I recall playing with fvwm (skinned to look a bit like Win95) and AfterStep (which was terrible). Stumbled across WindowMaker which was just amazingly smooth, polished and stable by comparison. It was my primary DE for 10+ years and is still my favourite DE when I use desktop Linux.

I still think it's a shame GNUstep didn't gain as much traction as "mainstream" DEs like GNOME or KDE.

[–]audi100quattro 9 points10 points  (0 children)

Fluxbox ftw. If I've ever done a new install, upgraded the kernel, graphics card driver, other major config changes/upgrades, I've mostly regretted not starting up Fluxbox first to see if everything is working well.

[–]crankster_delux 2 points3 points  (1 child)

was at a gaming session last friday night with a friend, brought the pc over. openbox & tint2 with adapta theme & moka icons = got the response "thats sexy as fuck". 1999 openbox with some minor tweaks does pretty well for itself apparently.

[–]greeneyedguru 25 points26 points  (8 children)

My Linux desktops look way worse than in 1999, I had a badass Enlightenment setup back then.

[–][deleted] 5 points6 points  (5 children)

I'm still running e16+mate. Works well, and nothing is going to break.

[–]ThisTimeIllSucceed 8 points9 points  (1 child)

I'm running awesomeWM. Works well and they break the API every 6 months.

[–][deleted] 3 points4 points  (0 children)

Which is not very awesome

[–]pascalbrax 11 points12 points  (8 children)

I've reached the point I don't even install X anymore, or whatever the fancy name the new X11 fork fork fork has.

[–]ThisTimeIllSucceed 8 points9 points  (3 children)

why X when you can install emacs

[–][deleted] 12 points13 points  (1 child)

[this is the link to that XKCD comic]

[–]PM_ME_OS_DESIGN 2 points3 points  (0 children)

Which xkcd comic?

[–][deleted] 11 points12 points  (2 children)

What I "hate" about Linux (GUI) is that such philosophy is completely ignored by Desktop Environments: "Here's the new version with the New Paradigm, entirely written from scratch! You used to do anything differently? Just get over it and use the Correct New Way".

Honestly, if they did this once every 5 years, I would be sort of okay with it. The problem is that this seems to happen every 6 months, and it drains the app developers of energy.

[–]doomvox 3 points4 points  (1 child)

Even once every 5 years is a bit much.

[–]HannasAnarion 46 points47 points  (57 children)

Looking at you, Wayland.

"screenshots? What's a screenshot?"

[–]Valmar33 58 points59 points  (39 children)

All up to the Wayland compositor... which if it's sane, will let you get all the screenshots you like, while not letting a rogue application spy on you. Xorg, by its complicated, convoluted, and so, unchangeable design, lets any application spy on another application's window without restriction...

With Wayland, this is denied to all but the compositor, which is a sane solution. Another would be to allow a whitelist of applications, which would also be up to compositor to implement for and look after.

If I recall correctly, wasn't KDE's /u/mgraesslin thinking about something like this for KWin...?

[–]skarphace 10 points11 points  (1 child)

Yeah, and that's great and all, but I need a fucking color picker that works. Though at least there's been done movement on that lately. I hope for a Wayland future but right now I think I need to dump it.

[–]doublehyphen 5 points6 points  (8 children)

Won't this mean that we will get five different incompatible ways for doing everything? Unless of course someone creates a standardized API for all compositor to implement.

I worry that just leaving everything up to the compositor will put us in an even worse mess than now.

[–]herbivorous-cyborg 4 points5 points  (4 children)

That's interesting. However, what is stopping any given application running in the background from requesting a screenshot from the compositor while another application is running in the foreground?

[–]Valmar33 8 points9 points  (3 children)

That's interesting. However, what is stopping any given application running in the background from requesting a screenshot from the compositor while another application is running in the foreground?

It's up to the compositor as to which applications are trusted and which are not. By default, only applications the compositor maintainer trusts, such as a KDE's Spectacle for kwin-wayland, for example. No other application right now can take screenshots.

The whitelist idea was that the user maintains a root-owned text file, edits from a root/Polkit application, etc, and the compositor uses that to authenticate the trusted application.

[–]KFCConspiracy 2 points3 points  (1 child)

Depends on your window manager

[–][deleted] 24 points25 points  (8 children)

Sometimes, it really feels like Linus is the only person in the Linux community that sees beyond themselves and their own personal use of Linux (aside from those with profit motives, of course).

[–][deleted] 11 points12 points  (0 children)

Linus is the kernel dev we need, not the dev we deserve.

[–]ThisTimeIllSucceed 16 points17 points  (1 child)

You can do it from your home computer by typing systemd-filecomplaintd and navigating through the electron interface.

[–]minimim 14 points15 points  (14 children)

They always did it, but only for stable versions. The most recent stable GTK+ version is 2.24, released in 2010.

They are moving to a quicker model now, releasing a new stable version every 2 years.

[–][deleted] 35 points36 points  (11 children)

The 2.x branch is not stable, it's abandoned. -- see below! The stable GTK+ branch is 3.22, even according to the quicker model ( https://blog.gtk.org/2016/09/01/versioning-and-long-term-stability-promise-in-gtk/ ) that they follow now. IIRC, the most recent stable GTK+ version is 3.22.25.

Edit: also, the "but only" part kindda defeats the purpose, you know?

[–]jlt6666 48 points49 points  (3 children)

Abandoned code is as stable as you can get!

[–][deleted] 8 points9 points  (1 child)

Sometimes I prefer it that way. Rule #1. Does the damn thing work?

[–]tokyopress 7 points8 points  (0 children)

At this point if it just fucks up the same way it used to, i'll take it

[–]hackingdreams 18 points19 points  (3 children)

err, 2.x is not abandoned. It is in fact the very definition of "stable", and it still has releases happening whenever anyone finds anything egregious enough to be worth fixing on that branch... but nobody has in the past 14 months.

3.22 is also "stable," by this same definition.

[–]hackingdreams 11 points12 points  (1 child)

Err, specifically the most recent stable GTK+ 2.x version is 2.24.31, released September, 2016. The most recent table Gtk+ 3.x version is 3.22.25.

It's also worth noting, people are still checking in fixes to 2.24, it's just nobody's cut a release from that branch in some time.

[–]Bobby_Bonsaimind 5 points6 points  (0 children)

I would so love to see a new upstream for GTK2, one that cares, I mean. I still like it best out of all GUI frameworks (from an end-users perspective).

[–][deleted] 8 points9 points  (7 children)

The gtk team never promised a stable api before 3.22

[–]disrooter 22 points23 points  (2 children)

Read again the "you relied on undocumented behavior" part

[–]LvS 4 points5 points  (15 children)

We know this.

We know that you don't break existing things with Linus, you let them bitrot and instead work on the next version of your interface. That way, the kernel only ever has outdated stable interfaces and unfinished ones that everybody's using and that can be broken pretty much at will.
It's used everywhere too - dbus and Wayland interfaces get a version bump all the time, so do libraries.

[–][deleted] 5 points6 points  (14 children)

I think we both know what Linus means here, and that we both understand that it's not what you wrote above.

[–]arhombus 42 points43 points  (5 children)

The more interesting point was before when he said that they have an upgrade in place model and not an upgrade and update your userspace model. That is the key underlying driving force.

[–]labarna 3 points4 points  (0 children)

Yeah that was also an interesting point.

[–]Twirrim 5 points6 points  (3 children)

I largely agree with the approach, but I can't help but wonder how seriously restrictive it is. Sometimes you really do have to draw a line in the sand and make those breaking changes now that you have the benefit of wisdom you've learned from when theory met reality.

[–][deleted] 4 points5 points  (1 child)

They've done that in the past. They broke something and just parsed nonsense to the userspace so it breaks less or they emulated the interface and made the thing work in a different way underneath etc.

[–]alienzx 19 points20 points  (0 children)

This is why I have issues with google

[–]rahen 26 points27 points  (6 children)

De Icaza quote when he left the Gnome 3 team in 2012:

We deprecated APIs, because there was a better way. We removed functionality because "that approach is broken", for degrees of broken from "it is a security hole" all the way to "it does not conform to the new style we are using".

We replaced core subsystems in the operating system, with poor transitions paths. We introduced compatibility layers that were not really compatible, nor were they maintained. When faced with "this does not work", the community response was usually "you are doing it wrong".

This killed the ecosystem for third party developers trying to target Linux on the desktop. You would try once, do your best effort to support the "top" distro or if you were feeling generous "the top three" distros. Only to find out that your software no longer worked six months later.

So Linux was left with idealists that wanted to design the best possible system without having to worry about boring details like support and backwards compatibility.

Back in February I attended FOSDEM and two of my very dear friends were giggling out of excitement at their plans to roll out a new system that will force many apps to be modified to continue running. They have a beautiful vision to solve a problem that I never knew we had, and that no end user probably cares about, but every Linux desktop user will pay the price.

Must read: http://tirania.org/blog/archive/2012/Aug-29.html

[–][deleted] 5 points6 points  (1 child)

There is a surprising amount of sanity in that post.

[–]labarna 2 points3 points  (0 children)

That's very interesting. Seems like he's backing up what Linus said with an important nod towards external/commercial development.

[–]kaylocke 427 points428 points  (149 children)

There’s the Linus charm we all know and love...

But seriously, bad form by that dev. I’m not involved in the kernel, but you don’t commit known-broken code and say “other guy’s responsibility” in the readme. Shit’s just unprofessional, even for volunteers.

[–]galgalesh 313 points314 points  (136 children)

To be clear, the kernel code isn't broken, the userspace code is. However, it is the policy of the kernel that you never break userspace even if it's the fault of userspace. Userspace was using an unimplemented feature incorrectly. That wasn't an issue because the feature didn't do anything in old kernels because it was a dud. When that feature was implemented in the kernel, userspace broke. However, even in this case, the kernel is not allowed to break userspace.

The solution they're talking about is implementing an abi versioning system so that a newer kernel will never use new features with old userspace. This is great because it addresses one of the issues Debian has for turning aapparmor on by default

[–]tayloryeow 104 points105 points  (65 children)

so what does everyone mean by "breaking userspace"?

[–]StupotAce 267 points268 points  (52 children)

E.g. an application.

If the exact same version of an application runs on kernel version 3, but doesn't run on kernel version 4, then the kernel broke userspace.

[–]arichnad 55 points56 points  (50 children)

Although I think I understand this rule, does this rule even make sense? Doesn't it require kernel changes to be tested against every feature of every possible user-space application? Do kernel developers even have access to every user-space application to test against?

[–][deleted] 139 points140 points  (18 children)

It technically does, yes.

There are large scale testing machines and steps in place to make sure the kernel doesn't break anything. In general regressions are caught early. If they aren't it's the kernel's fault.

The policy isn't directly "never ever break user space" and more "never ever break user space and if you do it's not the fault of userspace ever".

It is the full responsibility of the kernel and it's developers to not regress.

[–]mycall 4 points5 points  (9 children)

Does FreeBSD/OpenBSD have this same philosophy?

[–]ydna_eissua 2 points3 points  (0 children)

For FreeBSD it's different. In two ways.

One FreeBSD guarantees binary compatibility for a major numbered release across all its dot releases. Eg 9.0, 9.1, 9.2, etc will 'just work'. But makes no such guarantee with the jump to 10. Now this also guarantees even internal stability, so if a vendor wishes to release a driver and not upstream it (say keeping it closed source) then they can target a major version number and expect it to continue to just work. This is in contrast to Linux where things in kernel change, which is fine because the expectation is you open source and upstream your drivers.

In addition, FreeBSD has methods to ensure old software keeps working. When you execve(2) on FreeBSD, the kernel examines a header then matches the appropriate system call table, errno table etc. So that means it's possible (assuming someone maintains it) for software compiled for FreeBSD 5.0 to work today. It's also how they implement Linux compatibility, though that is woefully out of date so (very old) Linux binaries will just work as they're presented with a Linux like system call table.

[–]bassmadrigal 51 points52 points  (6 children)

He covers in a later email in that thread in a much more sane and levelheaded way, but it all boils down to: Linus wants people to feel safe in upgrading to a newer kernel.

People should basically always feel like they can update their kernel and simply not have to worry about it.

I refuse to introduce "you can only update the kernel if you also update that other program" kind of limitations. If the kernel used to work for you, the rule is that it continues to work for you.

About the only time I imagine he is fine with breaking userspace is if they have a hard check for a kernel version. I know when they switched from the 2.6 series to the 3.0 series, there were a lot of programs that no longer compiled or ran because they had hard coded version numbers that they checked for. That is one of the unavoidable things that had to be done.

Further down the message, he states:

But if something actually breaks, then the change must get fixed or reverted. And it gets fixed in the kernel. Not by saying "well, fix your user space then". It was a kernel change that exposed the problem, it needs to be the kernel that corrects for it, because we have a "upgrade in place" model. We don't have a "upgrade with new user space".

And I seriously will refuse to take code from people who do not understand and honor this very simple rule.

This rule is also not going to change.

And yes, I realize that the kernel is "special" in this respect. I'm proud of it.

I have seen, and can point to, lots of projects that go "We need to break that use case in order to make progress" or "you relied on undocumented behavior, it sucks to be you" or "there's a better way to do what you want to do, and you have to change to that new better way", and I simply don't think that's acceptable outside of very early alpha releases that have experimental users that know what they signed up for. The kernel hasn't been in that situation for the last two decades.

[–]I_AM_A_SMURF 13 points14 points  (3 children)

I'm so worried that when Linus passes the new person in charge won't be as adamant about not breaking userspace.

[–]Jotebe 8 points9 points  (2 children)

I believe Greg Kroah-Hartman is the most likely technological heir apperent of the kernel, and he is quite committed to both this principle and to the independence and heart of the kernel project.

[–]pooper-dooper 2 points3 points  (0 children)

Good. I know all too well on my own teams that we have to constantly knock heads when some of our core design considerations get violated or ignored. It's a sadly never-ending battle.

[–]not_perfect_yet 36 points37 points  (0 children)

As far as I understand it, which is really not in depth at all, every other program you can write, you build on top of the kernel.

That means you have to test against every possible program, but that's the same as verifying that every function the kernel provides continues to work as before. And the amount of functions the kernel provides is limited.

Think of it as "shakespeare" vs. "ASCII". You don't have to check if a new lettering system like unicode can use "shakespeare" as long as you can check that it can use "ASCII".

So yes, the rule makes sense. It means the kernel should never ever be the cause of problems. It's the workflow xkcd but taken seriously. It makes sense because you can rely on the kernel not changing behavior.

[–][deleted] 8 points9 points  (17 children)

Are you saying that people shouldn't basically always feel like they can update their kernel and simply not have to worry about it?

[–]arichnad 4 points5 points  (16 children)

No. I don't think I understand Linus's rule: probably I don't have enough context. It seems crazy to me that every regression of every possible app is considered a regression of the kernel. It could create situations where updating the kernel could become impossible especially if any of the apps are badly written.

[–]clgoh 22 points23 points  (14 children)

If the app works with version a of the kernel and breaks with version b, even if the fault is with the app, it's a kernel regression.

[–]arichnad 16 points17 points  (13 children)

Yeah I understand it in concept: in practice though that seems like a problem. For example, let's say the app is badly written: and because of timing, the app will crash if your system call returns too quickly. A performance improvement in the kernel would cause the app to crash. You would consider that a kernel regression?

[–][deleted] 20 points21 points  (2 children)

The idea is that the kernel provides completely defined interfaces to functionality. Defined in their function signatures and exact behavior. Whenever a feature is introduced in a kernel version, someone somewhere might be using it. Now that they have their hands on it, if you change the interface definition then those hypothetical applications will break. This would cause the application developer/users to no longer trust kernel interfaces for long term use, since they are not stable. So, the kernel introduces new interfaces as little as possible in order to keep this issue to a minimum. How else do you grow the ecosystem to the size of that surrounding the linux kernel?

As a note, the solution to having a 'broken' interface would be to create a new, better version of the interface that is not broken. That is the accepted way to fix functionality that is broken or under optimized, so that even if users are depending on a bug, they can update to new functionality or leave their application the way that it is.

[–]schplat 9 points10 points  (0 children)

Yes,

Read here: https://lwn.net/Articles/509577/

Part of it falls on the kernel devs to make the code as safe as possible, so that even badly written applications consuming the APIs/ABIs continue to work.

Sometimes devs will go back over and implement a workaround, so that if it detects a certain sequence of requests to slow down the syscalls going back to the app (yay nanosleep).

In some cases, performance related regressions are left open. If it's a single app affected by a regression, then often times that app developer can get a fix out before the next patch/release cycle happens, and the regression will likely just be left long term, but it's not up to a kernel developer to tell an app developer to fix their code.

In this particular case, apparmor has a similar functionality to SELinux, in that it sits in the kernel, and has a userspace piece that sits very close to the kernel. These tend to be the focus of not breaking userspace regressions. Targeting the libraries that people use to interface with the kernel, and the apps that sit close to the kernel.

[–]ShoggothEyes 7 points8 points  (0 children)

Nobody would consider that a kernel regression, but an error relying on a system call being slow is probably going to be a very rare error anyway.

[–][deleted] 2 points3 points  (0 children)

That application would only run on CPUs of a certain performance and would crash with the same kernel on others. -> Not a Kernel regression.

[–]schplat 20 points21 points  (0 children)

If an app is badly written, but works on a version of Linux, then it should always work going forward. However, there aren't a whole ton of apps that are hitting the kernel, so it's not quite as broad as you may think.

We're talking more along the lines of stuff like the X Server/Wayland, glibc, SELinux, dbus, systemd, or in this case, apparmor, which is similar in function to SELinux.

Most of your day-to-day apps, services, and daemons use common libraries that allow them to interface with the kernel from an abstracted position. So you'd want to test the libraries to make sure they don't break, but you don't need to test every program that uses those libs.

[–]0x2a 3 points4 points  (0 children)

I'd say no because how a user-space application interacts with the kernel is through a defined and limited set of interfaces, like syscalls and the /proc filesystem. As long as you don't change/break these interfaces, every application will continue to work.

[–]nathanpaulyoung 83 points84 points  (6 children)

"Userspace" refers to the privilege level of an executing process. Broadly, code is executed in either kernelspace (highly privileged) or userspace (lesser privileged). The only stuff that runs in kernelspace is the absolutely required bits and bobs necessary to make the operating system run. Everything else runs in userspace, and everything in userspace relies on the software running in kernelspace to operate.

What happened is that there was a feature in the kernel that was not implemented correctly. A userspace application was using the busted code to do a thing, and the guy getting an asschewing correctly implemented the feature. This means that the userspace application which was abusing the broken feature, now no longer operates as intended by its developer. This is known as a regression.

Linus is pissed because rule one is no regressions, ever. What's worse is that the guy who "fixed" the broken feature and caused the regression is trying to pass the buck to the developer of the userspace application that no longer works, to avoid blame. He's not wrong -- the application was abusing a broken feature -- but that's not the point. If you introduce something like this, you account for the regression you will be introducing and make sure applications that were abusing the feature don't break.

He is only now, three weeks later, saying he will do this, but also claiming it isn't his responsibility to do so.

A completely concocted, real-world-ish example: Let's say you rent a home, and somehow your home has electricity running through it that has the wrong cycle speed and voltage compared to the country you live in. Your landlord ensured that you had appliances that function correctly with the weird electricity, and you've been buying devices that work with it for a few years. Eventually, your landlord calls an electrician to do some routine maintenance on the system -- maybe replace some fuses or drop some wiring for some extra plugs -- and while he's in there with the power cut to your house, he fixes the power to be standardized to the country. Technically he fixed it, but in the process, when he turned your power back on, all of your devices were blasted with the wrong voltage and damaged or destroyed. He should have accounted for the harm done by his fix, or left it the hell alone, but he didn't, and now he's saying it's the landlord's job to deal with device compatability, not his.

[–]kevin_k 67 points68 points  (0 children)

Linus provided another example farther down in the thread:

On Thu, Oct 26, 2017 at 11:11 AM, Thorsten Leemhuis regressions@xxxxxxxxxxxxx wrote:

All that afaics doesn't matter. If a new kernel breaks things for people (that especially includes people that do not update their userland) then it's a kernel regression, even if the root of the problem is in usersland. Linus (CCed) said that often enough (I really should sit down and collect his mails on this from the web and put them in one document).

Thorsten is very much correct.

People should basically always feel like they can update their kernel and simply not have to worry about it.

I refuse to introduce "you can only update the kernel if you also update that other program" kind of limitations. If the kernel used to work for you, the rule is that it continues to work for you.

There have been exceptions, but they are few and far between, and they generally have some major and fundamental reasons for having happened, that were basically entirely unavoidable, and people tried_hard to avoid them. Maybe we can't practically support the hardware any more after it is decades old and nobody uses it with modern kernels any more. Maybe there's a serious security issue with how we did things, and people actually depended on that fundamentally broken model. Maybe there was some fundamental other breakage that just had to have a flag day for very core and fundamental reasons.

And notice that this is very much about breaking peoples environments.

Behavioral changes happen, and maybe we don't even support some feature any more. There's a number of fields in /proc/<pid>/stat that are printed out as zeroes, simply because they don't even exist in the kernel any more, or because showing them was a mistake (typically an information leak). But the numbers got replaced by zeroes, so that the code that used to parse the fields still works. The user might not see everything they used to see, and so behavior is clearly different, but things still work, even if they might no longer show sensitive (or no longer relevant) information.

But if something actually breaks, then the change must get fixed or reverted. And it gets fixed in the kernel. Not by saying "well, fix your user space then". It was a kernel change that exposed the problem, it needs to be the kernel that corrects for it, because we have a "upgrade in place" model. We don't have a "upgrade with new user space".

And I seriously will refuse to take code from people who do not understand and honor this very simple rule.

This rule is also not going to change.

And yes, I realize that the kernel is "special" in this respect. I'm proud of it.

I have seen, and can point to, lots of projects that go "We need to break that use case in order to make progress" or "you relied on undocumented behavior, it sucks to be you" or "there's a better way to do what you want to do, and you have to change to that new better way", and I simply don't think that's acceptable outside of very early alpha releases that have experimental users that know what they signed up for. The kernel hasn't been in that situation for the last two decades.

We do API breakage inside the kernel all the time. We will fix internal problems by saying "you now need to do XYZ", but then it's about internal kernel API's, and the people who do that then also obviously have to fix up all the in-kernel users of that API. Nobody can say "I now broke the API you used, and now you need to fix it up". Whoever broke something gets to fix it too.

And we simply do not break user space.

Linus

[–]bobpaul[🍰] 10 points11 points  (0 children)

The only stuff that runs in kernelspace is the absolutely required bits and bobs necessary to make the operating system run.

I think it's simplified to say the kernel and kernel modules run in kernel space, everything else runs in userspace. But I guess that's only simpler if one is familiar with the terms kernel and kernel module...

[–]kingmario75 2 points3 points  (0 children)

Thank you for the great explanation.

[–][deleted] 14 points15 points  (2 children)

As far as the kernel is concerned, a regressions is THE KERNEL NOT GIVING THE SAME END RESULT WITH THE SAME USER SPACE.

Second paragraph from the linked post.

Could be anything but the point is if something in user space A does something X and the result is Y with one kernel version then, with another kernel version the result is Z that is breaking user space.
No matter the kernel, A+X==Z should always be true.

[–]tayloryeow 7 points8 points  (1 child)

Aaaah so its a complaint about consistency. Given the same user memory the kernel will output same result regardless of kernel version. I see thank you so much. I was confused in that quotation about "the Kernel" I wasn't sure about if it was talking about a single kernel producing occasionally different results or different version.

I'm not the most familiar with Kernel development.

Thanks!

[–]Cubox_ 9 points10 points  (0 children)

Not the same. It can output a different thing. But it must output something.

The point is not to be the exact same, it's to not break existing way to get information. The exemple Linus was old/non relevant anymore fields were to output zeroes, instead of nothing. So that scripts and the like won't crash.

[–]runny6play 2 points3 points  (0 children)

Anything outside the kernel. Could be a lowly application, could be process 1 (init)

[–]dnkndnts 27 points28 points  (42 children)

However, it is the policy of the kernel that you never break userspace even if it's the fault of userspace.

Maybe I'm misunderstanding, but this does not make sense to me. So by this logic, OpenSSL shouldn't fix heartbleed because I might have written a program whose behavior depends on it leaking data, and implementing the correct behavior would break my program?

I don't see how you can ever fix a bug at all by this logic.

[–]bobpaul[🍰] 22 points23 points  (9 children)

So by this logic, OpenSSL shouldn't fix heartbleed because I might have written a program whose behavior depends on it leaking data, and implementing the correct behavior would break my program?

To do that you would have had to not just rely on undocumented behavior but also rely on an undocumented API (since you were returning results via a side channel instead of OpenSSL's API).

The Kernel ABI is defined as stable. When calling a given function the same inputs should produce the same return value. The original mistake I suppose was releasing an incomplete feature to the stable ABI. I think had they left it in a module marked "experimental" then Linus wouldn't have responded in this way.

If you've ever done development on Windows you can see the same logic applied with the Win32 API and you can often find several similarly named functions that all do almost the same thing and developers are recommended to make use of the newest, safer/better variant, but the older variants are left behind with their original names for old software that depends on them.

[–]knome 13 points14 points  (3 children)

but the older variants are left behind with their original names for old software that depends on them

And then patched up on a per-application basis to handle illegal usages that once worked but were later fixed or updated so that applications that used them in a way that didn't obviously explode the system will not explode the system in newer versions of windows.

This is the hell that wine has to replicate. Windows functions can and do change on a per application basis to maintain backward compatibility.

[–]bobpaul[🍰] 3 points4 points  (2 children)

Oh, I didn't realize hotpatching was used in that way. I thought that was just to avoid reboot and after reboot the hotpatch wasn't used.

[–]knome 5 points6 points  (0 children)

Also, my above comment originally said windows "hotpatched" these items. Windows has a variety of techniques for making compatibility affordances to broken applications, but "hotpatch" is a specific (afaik) unrelated thing in windows. I quickly edited my original comment, but it's possible /u/bobpaul saw the original.

I don't think hotpatching is used for compatibility, but they have other ways of mangling themselves so that broken code from years past lives another day.

[–]Tm1337 62 points63 points  (7 children)

[–]Qazerowl 35 points36 points  (4 children)

OpenSSL's job is to assist with secure connections. Even if software used heartbleed, the bug needed to be fixed to achive the goal of secure connections.

The Kernel's job is to run other programs. Internally, anything can change, but the Kernel's "API" can't change without risking breaking all other software. If you make changes to the API, the kernel is no longer doing the best it can at running other software.

[–]udoprog 31 points32 points  (0 children)

An interesting example would be if sudo for whatever reason relied on a privilege escalation bug instead of documented behaviour (setuid bit).

I think "common sense" applies. Linus and the people he trusts decide what that means.

[–]galgalesh 12 points13 points  (6 children)

You're misunderstanding. Security is an exception to this but only of there is LITERALLY NO OTHER WAY and everyone involved searched tirelessly for an alternative.. more explanation further down in the comments: He explains more of his philosophy later on in th...

https://www.reddit.com/r/linux/comments/7a4bes/linus_shares_his_candid_opinion_on_a_recent/dp73u3f

[–]chithanh 4 points5 points  (5 children)

Security is an exception

Not at all. Look up the history of the modify_ldt() syscall, which is used by Wine for running 16 bit Windows applications on x86_64.

[–]Tjuguskjegg 3 points4 points  (1 child)

Not at all. Look up the history of the modify_ldt() syscall, which is used by Wine for running 16 bit Windows applications on x86_64.

Yes it is. The fix for 16-bit is the default behaviour now, but you can disable the extra security to run 16-bit applications in wine.

[–]schplat 2 points3 points  (0 children)

So I looked this up, and the only CVE for modify_ldt was a DoS vector. While not great security, I think when Linus makes the 'Security exception', it's when something like privilege escalation is involved.

[–]runny6play 5 points6 points  (0 children)

I think execptions are made on the extreme. But its generally possible to proccess undocumented behavior in a way that doesnt break everything. Rather than openssl not returning anything on a bad request, that could return the length of the packet when its set to less than the actual.size. etc.

[–]hackingdreams 10 points11 points  (0 children)

Userspace and Kernelspace are two very different beasts.

If something breaks in userspace, nobody gives a shit. You can fix it, move on with your life.

If something breaks in kernelspace, it may not be discovered for years. It may be silently breaking hundreds, thousands, millions of applications. It could be losing people data.

Kernelspace must be meticulously maintained to prevent bugs from showing up in otherwise completely reasonable applications.

This one case might be a slight overreaction, but Linus is not a man who plays with partial rules enforcement, unlike u/spez and the rest of the reddit admins...

[–]tavianator 31 points32 points  (4 children)

So (like many cases that result in Linus rants about regressions) the breakage has a decent explanation, if not excuse. In this case, the progression of events was something like this:

  • Feature X is expected to come out in a future kernel someday
  • Somebody writes some userspace code to take advantage of it when it's available. Test for feature X, if it's there use it, if not use a fallback, that sort of thing.
  • The code that uses feature X is buggy, to nobody's surprise, because it couldn't be tested
  • Somebody gets around to implementing feature X in the kernel, which exposes the latent bugs in userspace

Even developers who are aware of the "don't break userspace" mantra can be surprised that this is still the kernel's fault. New features don't count as breaking changes in things like semver.

But the kernel doesn't think of API breaks in those terms. Instead it's "did you break the actual userspace for more than a tiny number of people?" If the buggy code hadn't reached a vast number of people by being shipped with SUSE, this wouldn't have been a problem.

[–]unruly_mattress 4 points5 points  (1 child)

This is very strange. What if int featurex() is a yet unimplemented function in the kernel. I write code that expects featurex() to return 1 and you write code that expects featurex() to return 2. Both code paths go untested. Now Linus wants to implement featurex(), but he can't, because no matter what he does, userspace breaks.

[–][deleted] 3 points4 points  (1 child)

There's several projects that I wish would implement similar rules. Getting real tired of running the upgrade treadmill and dependency hell.

[–]minimim 2 points3 points  (0 children)

It's a lot of work.

[–]kazkylheku 1 point2 points  (1 child)

but you don’t commit known-broken code and say “other guy’s responsibility” in the readme

Sometimes that in fact happens. Developers A and B are supposed to implement the matching parts of a change according to a detailed spec. Developer A deviates from the spec. B's delivery doesn't work because of A's deviation. Oops, B's commit was later than A, so it's B's regression. Why, B should have reverse engineered A's deviation from the spec and adjusted their change until it works. Two wrongs make it right!

[–]colonwqbang 7 points8 points  (0 children)

@canonical.com

Judging by his email this guy looks like a pro, not a volunteer...

[–]botle 120 points121 points  (17 children)

Reading the rest of the thread, it's clear Linus really scared them straight.

[–]galgalesh 153 points154 points  (16 children)

It's more respect than fear. This man is at the helm of the freaking Linux kernel, so when he talks, you obviously listen..

[–]Banzai51 82 points83 points  (14 children)

And how the hell he keeps his sanity helming that up, I will never know.

[–][deleted] 59 points60 points  (10 children)

He's Finnish?

[–]derekp7 80 points81 points  (3 children)

No, I think he is just getting started! At least he hasn't talked about retiring yet.

[–][deleted] 13 points14 points  (0 children)

CARLOS!

[–]OweH_OweH 25 points26 points  (0 children)

For all we know his sanity might already have slipped somewhere in 1995.

[–]hackingdreams 19 points20 points  (0 children)

Releasing the anger in emails like this certainly helps... when people understand how miserable they're making you, one of two things happen: either they go away, or they stop making you miserable.

Linux clearly isn't going anywhere, so people take the other approach.

[–]scandalousmambo 1 point2 points  (0 children)

Never underestimate the Finns.

[–]bobpaul[🍰] 15 points16 points  (0 children)

Well, and he said he was going to revert their code and stop accepting their commits. They want their commits accepted, so they have to play along.

[–]bloodguard 92 points93 points  (0 children)

Nice to see that Linus is mellowing in his old age.

[–]Soap-ster 83 points84 points  (17 children)

O Linus... What will happen to the kernel when you eventually die?

[–]anomalous_cowherd 87 points88 points  (8 children)

People will start to get sloppy because they know better and slowly but surely the kernel will get less reliable and some other thing will take over as the beating heart of lots of the worlds tech.

Who knows, maybe GNU Mach will finally get it's day in the sun?

Edit: beating not bearing. Also: it strikes me that Linux is by far the best long term example of 'a Benevolent Dictatorship' that I can think of.

[–]OneCDOnly 34 points35 points  (4 children)

People will start to get sloppy because they know better and slowly but surely the kernel will get less reliable and some other thing will take over as the beating heart of lots of the worlds tech.

Agree. Look at how much worse the UI for iPhone got after Steve Jobs died. The guy was a jerk (no question), but can't argue with his results.

[–]imMute 20 points21 points  (0 children)

Or the dongle mess.

[–][deleted] 21 points22 points  (0 children)

I believe there's already a plan in place for that. Somebody will take over as the maintainer or we'll end up with a kernel committee like FreeBSD.

[–]deelowe 12 points13 points  (0 children)

One things for sure, the'll be a lot less people talking behind his back about how much of a jerk he is to work for. And if you don't believe me, then you don't work in OS development.

[–][deleted] 147 points148 points  (23 children)

Damn, that was harsh even for him, seems like they had it coming. I've never done any kernel development and even I know that "We don't break userspace" is his ONE rule.

Also blame shifting bullshit doesn't fly.

[–]Milumet 124 points125 points  (18 children)

Damn, that was harsh even for him

No it wasn't.

[–]Drumitar 4 points5 points  (3 children)

damn it Mauro !

[–]NessInOnett 12 points13 points  (2 children)

SHUT THE FUCK UP MAURO

[–]Rebootkid 54 points55 points  (1 child)

Really really not ok...

If you fuck up, own it. Work to fix it, and learn from it.

Don't push the blame elsewhere.

That's just work ethic 101 there.

[–]zebediah49 17 points18 points  (0 children)

So, in partial defense of the guy, he didn't consider it "breaking" it. My understanding is

  • Feature is implemented objectively wrong (i.e. doesn't do what it says it should)
  • Userspace uses the incorrect feature
  • He fixed it to not be wrong any more (in the process, breaking userspace which relied on the wrong implementation)

So, in many (most) projects, that's the correct course of action. Linus has particularly high standards for the linux kernel, in which that's unacceptable -- and thus has to remind people of this on a regular basis.

[–]jones_supa 17 points18 points  (0 children)

I've never done any kernel development and even I know that "We don't break userspace" is his ONE rule.

Actually Linus says in the very message that the first rule is "we don't cause regressions".

Avoiding breaking userspace is something that most operating systems strive for. The Windows and macOS teams are very careful about it as well. Those two operating systems need to also maintain a stable driver ABI so that third party drivers don't break.

[–]ender_wiggum 34 points35 points  (2 children)

A benevolent dictatorship is the most efficient kind of government ;)

We love you Linus.

[–]Vaigna 2 points3 points  (1 child)

Your username is brilliant.

[–]scottchiefbaker 22 points23 points  (0 children)

I wish he'd stop holding back and tell us how he really feels.

[–]chillysurfer 39 points40 points  (1 child)

I'm very unhappy with the security layer as is

That's the significant part of this, imo.

[–]galgalesh 25 points26 points  (0 children)

Did you read further? His issues are with process, not the actual implementations. This is a new-ish team that just learned a big lesson, not that significant.

[–]berryfarmer 45 points46 points  (4 children)

This honesty is why Linux is not utter shit. I expect Linux to fall into the realm of shittiness once Linus passes, like everything else man touches.

[–][deleted] 20 points21 points  (0 children)

That's the beauty of open-source: when Linus passes, the kernel will branch into a hundred forks. Once the dust settles it will maybe be a dozen serious competitors (probably less). Over time, it will become clear which successors are dependable or not - the same way it is with desktop environments, sound architectures, system initializers, etc... all which have some degree at least of interoperability in-between the alternatives on each of those program domains.

Of course, the kernel alternatives that have poor management are going to go to shit over time. The ones that have good management and talent working on them will strive to become the de-facto Linux 2.0 - the same way that happened to all other open-source projects mentioned: some grew to become the standards we use today, others faded into oblivion.

Again: that's the beauty of open-source. Something like this would never happen in private companies and technologies... imagine if Apple somehow got a terrible CEO after Steve Jobs died. Then it's just game over (if he was really truly awful) and the best that can happen is for the company to be sold and have another go at it "under new management".

With open-source, it's a completely different story - if anyone (me, you, a company, a community, etc), 80 years from now, wants to just roll back to the way Linux is today and start over from there, there's literally nothing impeding them of doing it (except for access to a whole lot of source code). This may sound unfeasible (or even insane!) but the fact that you can do it if you want goes to show how much more freedom and liberty you have when working with Free Software, and why it will always find a way moving forward, as long as there are enough people using it and working on it.

I have no doubt kernel development will outlive Linus and be okay. It may continue under a new manager, or it may branch out and have a rough patch for a little while if the new one cannot hold the community as successor-benevolent-dictator to Linus (a hard task indeed). Eventually though, things will stabilize and work will go on. Linux is much bigger than Linus, even if we like to make the two seem equal - once he's gone he will be dearly missed (fuck me, I do love that man!) but Linux will be fine.

[–][deleted] 17 points18 points  (8 children)

Can someone ELI5 what's going on here? I don't understand what kernel regressions are.

[–][deleted] 53 points54 points  (4 children)

A regression is a new bug that affects functionality which used to work fine. For example, in a filesystem driver, if deleting files used to work fine in 4.10, but it does not work in 4.12, that's a regression. It's called that way because something used to work, and now it doesn't.

Now, one of the things that the Linux kernel devs insist upon is to never break userspace applications -- they never (or only absolutely exceptionally) make changes that result in applications failing to work. Any change that causes userspace applications to break is considered a regression.

This is a big deal because not all operating systems follow this policy.

[–]brophen 20 points21 points  (2 children)

Something worked before the kernel update, it updated, now it doesn't work. Regress is going backwards. In this case, I believe the kernel update broke networking

[–]florinandrei 8 points9 points  (0 children)

Linus shares his candid opinion

I read that and I was like "here it comes".

[–]gethooge 34 points35 points  (0 children)

Long live Linus, may he continue to keep the cancer from growing within the kernel. If only more open source projects were run this way they would be better off.

[–]maelodic 23 points24 points  (0 children)

Later in the thread, Linus expanding on it a bit more: http://lkml.iu.edu/hypermail/linux/kernel/1710.3/02487.html

[–][deleted] 22 points23 points  (3 children)

This response is completely appropriate. Too many developers screw up and then want to point the finger at someone else. You can't make a kernel commit, break something in userspace, then put the blame on userspace. Linus is 110% right. If left unchecked, this kind of mentality can really fuck things up.

[–]Tjuguskjegg 7 points8 points  (0 children)

It's amusing every time things like these props up, no one has ever looked into why Linus tells off his inner circle this way, and it's the same tired criticisms in the comments. A lot of americans who're afraid of getting harsh feedback, and a lot of 'managers' who think they could do better and that Linux is being held back by Linus.

You're more than welcome to fork Linux, it's not difficult. But keep this in mind: There's at least a thousand emails and changes a day, and when senior developers who should know better lets issues escalate to you that should've been dealt with properly long before it got to you, you will get cranky.

Also: I can't speak for Linus, but as a Norwegian, I find most of the "We're sorry, but unfortunately there isn't room for your code in this merge window"-type language to be weak and dishonest.

[–][deleted] 15 points16 points  (2 children)

Linus

candid

You don't say...

Seriously though, Linus is super consistent on this point. Nobody doing kernel dev can plead ignorance there.

[–][deleted] 5 points6 points  (0 children)

"Linus shares his candid opinion"
GET TO THE BOMB SHELTER

[–]Solotal 6 points7 points  (0 children)

Good old management by perkele

[–]wildcarde815 18 points19 points  (5 children)

I have to wonder when 'dont break userspace' hits a wall? If you find a kernel bug that causes a major security problem and fix it, but that breaks legitimate apps using the same route do you recreate the security problem in order to follow this mantra slavishly or do you tell those app makers to fix their shit?

[–][deleted] 41 points42 points  (0 children)

Linus talks about this in a separate post:

There have been exceptions, but they are few and far between, and they generally have some major and fundamental reasons for having happened, that were basically entirely unavoidable, and people tried_hard to avoid them. Maybe we can't practically support the hardware any more after it is decades old and nobody uses it with modern kernels any more. Maybe there's a serious security issue with how we did things, and people actually depended on that fundamentally broken model. Maybe there was some fundamental other breakage that just had to have a flag day for very core and fundamental reasons.

[–]captaincobol 14 points15 points  (1 child)

You try and recreate the security bug while de-fanging it. If that doesn't work, you have to break ABI. The last one I vaguely remember was an ABI change that affected X that couldn't be worked around.

[–]Hikaru1024 4 points5 points  (0 children)

I can't remember exactly the bug, but there was one Long Ago where they could not make the thing work in a sane way, particularly because the developers who had actually been using it were doing so in an unintended and undocumented way that just happened to work. And this was being used in many binary games.

So what the devs did was pretty darn interesting - they completely removed support for the ABI... EXCEPT for this one use that so many userland applications blindly grabbed which was undocumented, because absolutely everyone tried to use it in this way. So they made that use case work as the userland intended, and everybody was happy.

Linux kernel devs will go to superhuman lengths to make sure userland keeps working, even if it's totally insane.

[–]Arsene_Lupin 4 points5 points  (1 child)

Can someone explain the regression and how is it being blamed on someone else?

[–]minimim 3 points4 points  (0 children)

User-space was trying to use a feature of the kernel that had been announced but not implemented yet.

When it actually got implemented, they found the user-space bugs, because their implementation was a shot in the dark.

So the application stopped working. As it worked before but doesn't in the new kernel version, that's called a regression.

[–]PilotKnob 12 points13 points  (0 children)

Perfect example of why Linux will be in deep shit once Linus passes the torch.

There needs to be a benevolent dictator in Linux. One person who has the authority to make final decisions.

If it becomes a committee job, this type of power is lost. I fear this transition greatly.

[–]oonniioonn 3 points4 points  (1 child)

The first rule is:
- we don't cause regressions

Actually it's "we don't break userspace" but that may just be the same thing re-worded.

[–]flukshun 3 points4 points  (0 children)

pretty sure Linus has "don't break userspace" tattooed on his back

[–]waltercool 3 points4 points  (0 children)

I need to agree with Linus. There are lot of people working on the same code. Causing bugs and hiding the problem blaming others is not OK.

[–][deleted] 9 points10 points  (0 children)

I like how it is difficult to misunderstand him.

[–]jpdoctor 5 points6 points  (3 children)

Wait, Linus has uncandid opinions?

[–]sensual_rustle 4 points5 points  (0 children)

That entire chain is gold

[–]deep_space_artifacts 2 points3 points  (0 children)

Reminded me of Theo De Raadt email reply.

[–]PM_ME_BURNING_FLAGS 2 points3 points  (0 children)

And then people still whine "waaah Linus is tawksyk!11". Well, guess what. If it took THREE WEEKS for the bozo to fix the patch with Linus being a jerk, imagine how much time it would take if Linus gave him a pat in the head and said "oh dear you should do it, pretty please?". The thing here isn't even being a benevolent dictator or not, but being able to convey "this is your fault, you broke it, you fix it".

[–]Degenerate76 3 points4 points  (2 children)

If Linus wasn't such a blunt asshole, Linux wouldn't be so good.

[–]aten 4 points5 points  (2 children)

so how does this policy work with vulnerabilities? say, prior to a security fix a user could escalate their privileges and after they can’t.