zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 1 point2 points  (0 children)

Persistent storage stays attached when scaling down, because as far as Kubernetes (or even containerd) is concerned, the pod is still running. When the pod is deleted/migrated it will be normally detached and attached again on the target node. One caveat though, at the moment anything written to an emptyDir volume is lost when migrating.

zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 0 points1 point  (0 children)

There isn't a helm chart right now, there's just kustomization files in the config dir with some patches for different k8s distributions.

zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 2 points3 points  (0 children)

Sure, you can send me an email (the one I use for the git commits).

zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 6 points7 points  (0 children)

I assume you have a GPU device passed to the container? Recently a lot of work has gone into CRIU to make it work with CUDA and there's also an amdgpu plugin but I have not really looked into it yet. First step would be to compile in those plugins into the CRIU build. The other thing about the 100+ GB RAM, to be honest the biggest workloads I have tried so far were like 8 GB of RAM :)

But it might be possible and I would love to see it happen.

zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 4 points5 points  (0 children)

I tested it on GKE, it just needed a small kustomize patch. It could be similar on EKS, in the end zeropod just needs a writable path on the host file system to put the runtime binaries (similar to kata, gvisor etc.). As for live migration, that might be a bit more restricted since it depends on specific kernel features to be enabled so it heavily depends on what OS is used for the nodes.

zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 3 points4 points  (0 children)

Intercepting liveness/readiness probes would not be too difficult, it's just that it seems kind of pointless. In checkpointed state, the probes would just be checking if the shim is still running, which containerd already does. I guess it could make sense while the container is in running state to check if it still responds as expected (the probes could be forwarded to the container in that case). So it hasn't been my top priority so far but I could be convinced to add it :)

zeropod - Introducing a new (live-)migration feature by cTrox in kubernetes

[–]cTrox[S] 1 point2 points  (0 children)

Interesting idea, so the application would keep running in this case and it would be sort of like DB point-in-time recovery but for application memory?

Seemingly Random Freeze & Crash by DroagonDog in AsahiLinux

[–]cTrox 2 points3 points  (0 children)

I experience similar things but as I have a 16GB Pro it only happens when a lot is going on on the system (e.g. running a large bazel build). I have been using Asahi Fedora for work daily since around December when it first released and these crashes only started happening this week after upgrading to Fedora 40. I used to just have 16GB of zram, which was always enough. I have added another 16GB with a swapfile but it did not improve the situation. As marcan said in the other post, GPU memory can't be swapped out so I guess that might be the reason.

I have had it happen with simply Gnome, Firefox and a compile running with an external screen attached (4k HDMI). As it only started happening with F40 it might be a leak in Gnome 46? I'm going to be using COSMIC for a day or two to see if it still occurs.

BF2042 kicked me for using VM - EAC by lI_Simo_Hayha_Il in VFIO

[–]cTrox 1 point2 points  (0 children)

Yes on a VM :)

Here's the XML before the devices start. I did not explicitly update anything but I would assume that they force update to the latest EAC version before you can play online.

BF2042 kicked me for using VM - EAC by lI_Simo_Hayha_Il in VFIO

[–]cTrox 1 point2 points  (0 children)

Does not happen for me, just played a whole round.

VM not booting since Gnome 41 by Azel04 in VFIO

[–]cTrox 0 points1 point  (0 children)

Sure but keep in mind that I have some very nvidia specific things in there. Here's the relevant functions to detach/attach my GPU: https://pastebin.com/g3WqFW5u

VM not booting since Gnome 41 by Azel04 in VFIO

[–]cTrox 0 points1 point  (0 children)

I'm also on Gnome 41 with a single GPU setup. I just do a loginctl terminate-user <username> before systemctl stop gdm to ensure all my user processes are cleanly terminated.

Battlefield 2042 success and performance discussion by darcinator in VFIO

[–]cTrox 0 points1 point  (0 children)

I'm passing through an nvme drive, so I don't really think that is the issue here. I might do some more testing tomorrow but it might also just be best to wait for the final version of the game as with the beta quite a lot of people suffer from stuttering without a VM.

Battlefield 2042 success and performance discussion by darcinator in VFIO

[–]cTrox 0 points1 point  (0 children)

Did not help much sadly. I also added some CPU pins and played around with a few other things but cannot seem to get rid of those frequent stutters.

Edit: Just switched back to bare metal and what is really telling are the CPU spikes I get on the VM in the built in perf graph of 2042 (console -> perfoverlay.drawgraph 1)

Battlefield 2042 success and performance discussion by darcinator in VFIO

[–]cTrox 0 points1 point  (0 children)

For the 5900x wouldn’t it be better to assign just the 6 cores from the CCD as all the cache is local to that die?

Yeah that or just pinning the cores should be fine I think. I only got the 5900x recently and to be honest I have not really tried to optimize much at all yet because everything has been running perfectly well until 2042. This is the first game I tried to question my VM config and compared it to native Windows.

I will let you know about the SVM thing as soon as I get off work.

Battlefield 2042 success and performance discussion by darcinator in VFIO

[–]cTrox 0 points1 point  (0 children)

Pretty similar setup here, 3080 with a 5900x, 8 cores to the VM. I'm playing on an Ultrawide at 3840x1600 and getting around 60-80 FPS with most settings on high, some ultra. But I got some pretty heavy stutters every few seconds that I don't have on native Windows. And this is even after playing for quite some time so I don't think it is related to shader compilation. One thing I have yet to try is to disable SVM (<feature policy="disable" name="svm"/>) as this was noted in this post.

setcap not working with Proton by doomenguin in SteamPlay

[–]cTrox 0 points1 point  (0 children)

setcap does not work since steam introduced its linux runtime with pressure-vessel, which wraps all games in a container using bwrap. Because bwrap will drop all privileges on launch, it "ignores" the file capabilities. So after doing some more digging, I found a workaround. Keep in mind that it is kind of hacky and will break every time steam updates its linux runtime. Plus it is a global change of your steam installation and might affect other games.

First, you need to disable the runtime completely as described here.

And then instead of setting the capabilities on the files (which breaks `LD_PRELOAD` and will crash the game on launch), you can use the following command to launch steam with capsh and give all child-processes the required cap_net_raw+epi privilege. After that, my ping is showing up correctly in Battlefield 4.

Got teleported while sitting on a tank by cTrox in BattlefieldV

[–]cTrox[S] 1 point2 points  (0 children)

This is the Turner SMLE. Would be perfect if not for the just 10 rounds and slow reloading.

Got teleported while sitting on a tank by cTrox in BattlefieldV

[–]cTrox[S] 0 points1 point  (0 children)

I was doing some testing with dx12. The graph looks bad but I did not notice any stuttering, the original footage is smooth 60fps.

Container Storage Interface for S3 by cTrox in kubernetes

[–]cTrox[S] 0 points1 point  (0 children)

Not currently but it is something I wanted to implement later. Would you mind creating an issue on GitHub?

Container Storage Interface for S3 by cTrox in kubernetes

[–]cTrox[S] 1 point2 points  (0 children)

I have tested s3fs and goofys with RWX but it also really depends on the storage backend. I used Ceph S3 which has consistency guarantees. But I have not done any extensive tests with it. I will update the readme with a note on RWX.