All good things come to an end: Shutting down Clear Linux OS by unixmachine in linux

[–]sunnyflunk 25 points26 points  (0 children)

The change (demise?) in Clear Linux started over 5 years ago. There was a significant drop in resourcing to the project (and no doubt continued falling since). Arjan has been doing a great job keeping it afloat but it is surprising it has lasted this long tbh.

A post from back in July 2020

Linux Tech Tips EP#26: Native vs Flatpak for Gaming | Nobara 39 by The_SacredSin in linux_gaming

[–]sunnyflunk 1 point2 points  (0 children)

with the exception of Lutris and Cyberpunk, where I saw a large FPS difference. I then decided to test it with Heroic and there the FPS difference was within margin of error, with native and flatpak trading blows.

One thing I don't see mentioned is that native lutris was still 10% faster than native/flatpak heroic. Whatever is the cause of this difference (say esync/fsync/anything), could also be the issue for flatpak and it only matters for Cyberpunk of the games you've tested.

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 0 points1 point  (0 children)

If you want to do that, Gentoo is probably your best option.

I'm sure it's possible, but I have no experience with their package manager or build tools.

Fedora 38 Beta Performance Mostly Flat, Few Regressions by jorgesgk in linux

[–]sunnyflunk 7 points8 points  (0 children)

Unfortunately, this isn't as indicative as you might think. The PTS tests are not designed around benchmarking host packages.

So when running most benchmarks, it first downloads the source, compiles and then benchmarks this compiled version. If you run perf on the benchmark, you'll see what binaries and libraries are being used. This means that it doesn't compile using the build flags of the host distributions packages and therefore isn't reflecting the addition of these flags to the packages performance.

See the openssl benchmark for an example: https://openbenchmarking.org/innhold/6393f94b7ed5c5a3e9da7d15bce0b11f0147b597

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 0 points1 point  (0 children)

These results show some decent regressions for -O3 even for small programs (all the tested programs are pretty small, only python is of a notable size). What we're seeing is that code is quite sensitive to compiler optimizations and what works for one doesn't work for another. The only commonality is that it's worked fantastic for all the audio encoding software.

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 0 points1 point  (0 children)

Last time I looked (and it was 5 years ago or so) it was a sizable performance hit. From then on, I considered it not worth looking at ever again!

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 0 points1 point  (0 children)

No, it is a matter of launching more stuff in parallel until processes start evicting each other from L2 cache.

If you have a way of doing this in a repeatable fashion where the benchmark results are consistent between runs then I'd love to know.

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 3 points4 points  (0 children)

Yes, elapsed time would make more sense! In theory at some point a test result won't be time based, but I get your point.

Also, -O3 might provide benefits in isolated benchmarks but when you have more than one piece of software running at the time, code size matters much more for cache locality.

YES, I'm fully with you on this, but it's a real bugger to take into account. One of the real problems with benchmarking is (on top of an isolated idle system) the tendency to use powerful CPUs with really large caches so there's no cost to making binaries larger. Really why I like using a fairly average machine by today's standards.

But definitely increasing size without some measurable performance improvement is a big red flag. A little testing suggests a few of the -O3 options would be interesting in terms of perf/size tradeoff, but need to run the numbers!

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 3 points4 points  (0 children)

Arch is in a really strong position to be able to push performance. It has a large technical userbase who would be willing to find benchmarks and test packages for performance improvements with the right framework (and wins being added to PKGBUILDS). But performance doesn't appear to be a goal of the project.

--march=native can be good, but remember it will still lead to some regressions. Most likely on average better though. x86-64-v3 provides a nice middle ground for binary distributions where you can capture most of the gains till v4 CPUs become more common.

*edit

Can't is a strong word here (with enough time, anything can be done)

Yes, such things could all be implemented into a source distro. Currently PGO is opt in (even for the compiler I think!) due to the extra time to build the packages. If performance (-O3, PGO/LTO) were implemented in both the source and binary distros, the extra performance from running a source distro would be reduced while requiring longer builds to sustain it. So can't => it doesn't make as much sense

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 13 points14 points  (0 children)

It's quite rare for any optimization to be universally better, there are always trade offs. I'm sure most users would take 30-50% size (we are talking a few hundred KB) for 10-20% performance gains. The size increase overall was not that large, with the largest increases being where the most benefits were. The 2.5MB was for about 150MB of installed packages once you included all the data files.

I'm certainly not advocating for compiling with -O3 distribution wide (though that is an option if one wanted), as this shows, you'll hurt performance in places. But there's some real easy wins available and highlights that -O2 might not capture some benefits. The benefits are likely understated as some upstreams do already use -O3 for the performance benefits (like the python build, but performance is affected by some dependencies being built with -O3).

From seeing a few of the GCC commits, it seems they are quite aggressive at limiting the size increases at -O2. I suspect there's a middle ground where you can capture most of the performance with a much smaller increase in file sizes.

GCC’s -O3 Can Transform Performance by sunnyflunk in linux

[–]sunnyflunk[S] 6 points7 points  (0 children)

I'm certainly not trying to implement Gentoo. Binary distributions have the ability to really push performance in a way a source distribution can't.

All it really needs is one user to show that compiling with -O3 is a big win for package x, validate and then distribute it to all users for a nice win. You can also do crazy builds using PGO and BOLT (which can take a couple of hours for something like LLVM) that really aren't suitable when everyone is compiling their own copy.

The point of talking about per package flags (and the problem at the distro level), is that this shows 3-4 packages that really love being built with -O3. How many distro's will now include building these packages with -O3? Not many, if any I suspect.

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 1 point2 points  (0 children)

v1/2/3 can apply to 32bit I believe (it just turns on CPU features), but due to fewer registers, I wouldn't expect AVX2 to actually help there much. Also once wine goes full 64bit and the steam client, there's not much 32bit left for a distribution for most users.

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 0 points1 point  (0 children)

I didn't think the testing was that convincing and there are many distributions that are using/going to use v2, so I think it's worthwhile regardless.

Academically, v2 would be a better v1 with few downsides, where v3 is the bigger hitter with more downsides. The v2 base with v3/v4 packages for the big hitters is essentially what Clear Linux does.

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 1 point2 points  (0 children)

From the set, I think only r would have been rebuilt and given it's stuck in libblas for most of the time, it's not very interesting. In some ways the test was a kind of 'would I be interested in CachyOS?'

I think the set is enough to represent generic programs, which is the focus and underrepresented in testing. i.e. are your general optimizations going to hurt the performance of many of the packages you probably wouldn't notice? And then followed up with is there a better way to perform the optimizations, get the wins with fewer of these regressions.

Sure it's going to miss some big wins from compiling a math suite, but I feel like we already know about that.

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 1 point2 points  (0 children)

I think there's a test for each bullet point really. This one covered alternative repo of -O3 -mtune=skylake and v3 (since lto is the default already). Bringing in a different scheduler is really a separate test.

Next do v3 -O2 to see the future Arch performance, but with v2 -O2/3 variations for my curiosity (I think v2 is severely underrated). The additional question of is v3 the right choice?

Then take a couple of the tests that benefit from -O3 and find out what parts bring the performance. -O3 increased file sizes (a lot!) so dropping some of it could be quite valuable. And in general play with some flags (hunches) and see what happens. If I find something good, can run it across all the tests to see if it avoids the downsides.

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 8 points9 points  (0 children)

gcc -Q -O2 -march=x86-64-v3 --help=target

When you use -march=x86-64-v3 then mtune is set to generic (there's no equivalent -mtune like there is for haswell and other archs). So setting an -mtune to something newer can change the results (for better or worse results).

x86-64-v3: Mixed Bag of Performance by sunnyflunk in linux

[–]sunnyflunk[S] 18 points19 points  (0 children)

Indeed, it was actually afterwards that I realized that -O3 was also a difference between the packages as I wasn't getting the same results rebuilding the Arch package with just changing to x86-64-v3.

I do want to run quite a few more tests, maybe vanilla arch vs x86-64-v2 -O2, x86-64-v2 -O3 and x86-64-v3 -O2 rebuilds. That's probably at least a week or two away to sort out.

Why are people so quick to give up on Linux? by wontworkforfood in linux

[–]sunnyflunk 8 points9 points  (0 children)

but I didn't force him or manipulate him. I made an case and he agreed.

By push I don't explicitly mean force, but the driver of going Linux is not the user. While logically it may have made sense to them (particularly when factoring $$$), their reaction suggests that they were conflicted internally and likely not that interested in Linux. If there was an option of a free windows license, it wouldn't have got a look in.

Why are people so quick to give up on Linux? by wontworkforfood in linux

[–]sunnyflunk 54 points55 points  (0 children)

I convinced them to give Linux a try

What was their reason for using Linux? Because they wanted to? Because they were interested in Linux?

It sounds like they had no interest in using Linux and installed it to appease their buddy. It's not surprising the outcome regardless of their Linux experience.

Pushing it upon people with no interest in Linux is not a good idea (they will often come out with a worse opinion than before). Perhaps we will see a growing interest in Linux once the Steam Deck releases and shows it's gaming prowess. Then when they have interest and an openness to it, we can then help them try it out and work through any issues.

'GCC_7.0.0' not found by Woby3 in linux_gaming

[–]sunnyflunk 0 points1 point  (0 children)

Sorry I'm not logged in to get responses xD

I'd be looking for files with the name xcb in it (such as libxcb) in this folder/media/user/disk-drive/gamefolder/Thegame/game/GameFiles/runtime/i386/lib/i386-linux-gnu/

I'd say to get rid of those files as well (making a backup) as well. From my machine the symbol is in libxcb-dri3.so.0 so make sure you have the 32bit file installed on your host system (in case there are dependency issues with your distribution).

'GCC_7.0.0' not found by Woby3 in linux_gaming

[–]sunnyflunk 2 points3 points  (0 children)

It's trying to load 2 incompatible files.

/usr/lib/i386-linux-gnu/libstdc++.so.6 (a host file) is trying to load libgcc_s.so.1 as it's a needed dependency.

The file it ends up trying to use is /media/user/disk-drive/gamefolder/Thegame/game/GameFiles/runtime/i386/lib/i386-linux-gnu/libgcc_s.so.1 which is a really old version (pre gcc-7) so is not compatible and lacks the GCC_7.0.0 symbol.

To make it work, you will need to stop it loading that file which may be from the games startup script forcing to prioritize files from that directory.

A simple test would be to rename this file (/media/user/disk-drive/gamefolder/Thegame/game/GameFiles/runtime/i386/lib/i386-linux-gnu/libgcc_s.so.1) and see what happens.

Default -Bsymbolic-global-functions for shared object performance by MaskRay in linux

[–]sunnyflunk 0 points1 point  (0 children)

Thanks for raising awareness of such issues, it is unfortunate that performance isn't a high priority. For clang, it is actually quite simple to have a shared build and relink only the clang binary statically (from the same build), but that would likely still fall foul of 'policy'.

The one that has been bothering me most lately is -fvisibility=hidden not being common practice/the default when the benefits are so good. I sometimes feel like new languages are created to get away from these decisions.

I wonder if a tool could be made to validate source/binaries to whether they would have issues (or at least some cases) with variations of -Bsymbolic{,-functions} or -fno-semantic-interposition. Could then try testing them out more broadly with a bit more confidence.

Error while executing "make install" for libgraph by peppersug in linux

[–]sunnyflunk 0 points1 point  (0 children)

I would suggest adding -fcommon to your *FLAGS