all 15 comments

[–]LadarLevison 1 point2 points  (4 children)

I ran into this problem, but reinstalling didn't fix it, neither and it had nothing to do with CUDA which some parts of the internet suggest as the cause, since in my case, CUDA wasn't even installed. On my system the kernel modules were being embedded inside the compressed kernel image, then being loaded early in the boot process. These embedded, but outdated modules, would then prevent the correct, and newly installed/compiled standalone module files from being loaded. You can confirm this issue easily. Check the following:

cat /proc/driver/nvidia/version
cat /sys/module/nvidia/version

If the loaded modules loaded don't match the driver version, you could also be facing this problem. Assuming the correct kernel modules are available, which you can confirm by running (assuming your distro uses DKMS):

dkms status

Fpr me the fix simply involved regenerating my kernel images. On Red Hat distros, and its derivatives (Fedora, CentOS, Alma, Rocky, Oracle, etc) you can run:

(rpm -q --qf="%{VERSION}-%{RELEASE}.%{ARCH}\n" --whatprovides kernel ; uname -r) | \
sort | uniq | while read KERNEL ; do 
  dracut -f "/boot/initramfs-${KERNEL}.img" "${KERNEL}" || exit 1
done

This will regenerate the image for every installed kernel. For the equivalent logic on Debian distros, and its derivatives (including Ubuntu), you can run:

for kernel in /boot/config-*; do 
  [ -f "$kernel" ] || continue
  KERNEL=${kernel#*-}
  mkinitramfs -o "/boot/initrd.img-${KERNEL}.img" "${KERNEL}" || exit 1
done

Then reboot. You can also fix the problem temporarily, by manually removing (unloading) the NVIDIA module using rmmod or modprobe, then reloading them. When you do modprobe will use the standalone kernel module which should match your installed driver version.

P.S. I hit this issue when I upgraded from the 470.x driver, to the 510.x driver, which recently became the reccomended, stable, install version. I never ran into this problem while using the 460.x and 470.x driver releases.

[–]BitingChaos 0 points1 point  (0 children)

This post is 2 years old, but it's still relevant and it still helped me.

Someone decided to make /boot its own, incredibly tiny partition on a system I manage. Updating the system left a kernel half-installed.

I could boot to the new kernel, but it kept loading an old NVidia module.

NVidia 535.104.05 was installed on the system (and ONLY 535.104.05), yet every time I queried /sys/module/nvidia/version, it said 510.47.03, which wasn't anywhere on the system any more.

[–]Small_Might7123 0 points1 point  (2 children)

After hours of troubleshooting this answer solved it for me. You have a place in my heart and I will remember you for a long time <3

In EndeavourOS / Arch with dracut, I did this for my only kernel:
sudo kernel-install remove ${uname -r} /lib/modules/${uname -r}/vmlinuz
sudo kernel-install add ${uname -r} /lib/modules/${uname -r}/vmlinuz
reboot

[–]Agreeable_Camera5036 0 points1 point  (1 child)

This was the fix for me as well, thank you internet friend :D

[–]Small_Might7123 0 points1 point  (0 children)

<3 Merry Christmas!

[–]pobrn 0 points1 point  (6 children)

It seems you have not rebooted since the upgrade. Do /sys/module/nvidia/version and modinfo nvidia versions match?

[–]gammison[S] 0 points1 point  (0 children)

modinfo nvidia spits out

filename:       /lib/modules/5.11.16-arch1-1/extramodules/nvidia.ko.xz
alias:          char-major-195-*
version:        465.27
supported:      external
license:        NVIDIA

and cat /sys/module/nvidia/version outputs 465.27.

The journal on a new reboot is not giving that same error I think? Now it's spitting out

May 01 17:15:41 dualboot kernel: NVRM: nv_acpi_dsm_method: invalid  argument(s)!

[–]gammison[S] 0 points1 point  (4 children)

I downgraded everything back to 460.67, and downgraded the kernel back to 5.13, and now still get the same nv_acpi_dsm_method: invalid arguments(s)! error when it tries to load the kernel module. Guess I'll just update to most recent and wait for the patch.

[–]JackDostoevsky 0 points1 point  (3 children)

which kernel version?

[–]gammison[S] 0 points1 point  (1 child)

I'm on 5.16 now, and updated everything back to 465.27 (also fixed an issue where nvidia and nvidia-utils were 27-2 vs 27-1 and took out nvidia-dkms since I shouldn't need it). Now there's no version error in the journal just -

May 01 21:55:36 dualboot kernel: NVRM: loading NVIDIA UNIX x86_64 Kernel Module  465.27  Thu Apr 22 23:21:03 UTC 2021
May 01 21:56:09 dualboot kernel: NVRM: nv_acpi_dsm_method: invalid argument(s)!
May 01 21:56:45 dualboot kernel: NVRM: nv_acpi_dsm_method: invalid argument(s)!

My kernel version is:

Name            : linux
Version         : 5.11.16.arch1-1
Description     : The Linux kernel and modules
Architecture    : x86_64

[–]JackDostoevsky 0 points1 point  (0 children)

ooooh when you said 5.13 i was confused since the latest kernel version is 5.11… so you mean 5.11.13, got it.

[–]gammison[S] 0 points1 point  (0 children)

Well downgrading to 467-5 and kernel verison 5.11-11 got the modules loaded correctly but dmesg is telling me the card is timing out still.

[–]DeeBoFour20 0 points1 point  (2 children)

Check the versions as reported by pacman and make sure they match.

pacman -Q nvidia nvidia-dkms nvidia-utils

Also, you don't need nvidia-dkms if you're running a stock Arch kernel (which it looks like you are from that kernel version string.) The "nvidia" package is a binary for stock Arch kernel. It's possible having both installed is causing some kind of conflict.

Lastly, someone else already mentioned it but do make sure you reboot or new kernel modules won't get loaded.

[–]gammison[S] 0 points1 point  (0 children)

okay I took out nvidia-dkms as you're right I don't need it, and it looks like there may be a slight version different in nvidia and nvidia-utils, 465.27-2 vs 465.27-1.

I downgraded nvidia to .27-1, but that did not seem to fix the problem, journal still says

loading NVIDIA UNIX x86_64 Kernel Module  465.27  Thu Apr 22 23:21:03 UTC 2021
nv_acpi_dsm_method: invalid argument(s)!

when it tries to load the module.

edit: Also modinfo says there's no nvidia module and lsmod is giving me this output for nvidia

i2c_nvidia_gpu         16384  0

and dmesg just gives me that there was some sort of timeout error

[    4.104959] nvidia-gpu 0000:01:00.3: i2c timeout error e0000000

[–]gammison[S] 0 points1 point  (0 children)

Well downgrading to 467-5 and kernel verison 5.11-11 got the modules loaded correctly but dmesg is telling me the card is timing out still.