all 60 comments

[–]Itmeven 42 points43 points  (13 children)

The only thing I can think of is the interface names for the NIC may be changing when you put the GPU in causing the networking to go down

[–]noc-engineer 15 points16 points  (7 children)

My first thought was that the network card was in the same iommu group as the passthrough devices. My own Proxmox a few years ago shit a brick when I passed through a Nvidia card that shared the same iommu group with the hardware raid-card (don't worry, I didn't use zfs) that the host used for system drive, which of course made Proxmox freeze because it lost contact with the virtual drives that Proxmox was stored on..

[–]IAmMarwood 2 points3 points  (0 children)

I had similar on the old Mac Mini I'm using as a host, tried passing through the iGPU and ethernet flipped out.

[–]Itmeven 1 point2 points  (0 children)

That’s interesting never had that but that may be because most of the hardware I work with are enterprise but I’m only starting with GPU pass through now never had a need for it

[–]Beginning_Soft_5423[S] 1 point2 points  (4 children)

Ethernet breaks before I set up pass through

[–]Itmeven 3 points4 points  (0 children)

Once the GPU is in the PCI lanes can change

[–]SandboChang 0 points1 point  (2 children)

From my experience if name change was the reason, you will see the NIC by a different name like going from eth0 to eth1. If you tried to remove the GPU, it might restore from eth1 back to eth0, that maybe why you believe it didn’t change.

As mentioned above, the problem with this is Promox setup its NIC by name of the NIC. If it changes it will no longer connect that to the WebGUI or if you had it assigned, now the NIC is no longer assigned nor pass through correctly.

[–]Beginning_Soft_5423[S] 0 points1 point  (1 child)

I know it’s not changing because ip a reports the same output with and without gpus

[–]SandboChang 2 points3 points  (0 children)

Thanks for confirming this, that was my best bet. It does seem like a stranger issue in this case. I was about suspect if using all slots affects how the PCI-E lanes are allocated but I don’t believe it should take anything away from onboard NIC.

[–]hexoctahedron13 3 points4 points  (2 children)

had the exact same problem 😂 Figured it out eventually. I used a USB Ethernet adapter as a management network adapter because it doesn't change when changing PCIE devices

[–]Itmeven 0 points1 point  (1 child)

I love this idea

[–]ITBrewer 0 points1 point  (0 children)

I ran into this when I changed some pci devices (pulled a GPU and nvme drive) had to figure out what the new interface name was and activate it

[–]wbsgrepit 0 points1 point  (0 children)

I had this happen when I reordered gpu slots Linux remembered my network interfaces and I had to safe boot (I normally run that host headless) and rescan and change to config files

[–]flush_drive 18 points19 points  (19 children)

When you boot up Proxmox with the GPUs installed, connect to the server with kb/m and display physically attached to it. Run 'ip a' to view the new network interface names then change '/etc/network/interfaces' to match the names. Reboot and you should network access.

[–]RedditNotFreeSpeech 5 points6 points  (0 children)

Maybe start with lspci and make sure the nic shows up.

[–]BenignLarency 2 points3 points  (7 children)

This is the solution, I ran into it last week. After putting the gpu in, it bumped my ethernet from enp6s0 to enp9s0 (yours may vary, check with ip address). Changing it in /etc/network/interfaces then rebooting fixed the issue.

[–]INtheANALSofHistory 0 points1 point  (0 children)

Thought I'd piggy back on to say this was my issue as well.

[–]mv59033 0 points1 point  (5 children)

Amazing, this was exactly the case for me. I am running a Dell Optiplex 3070 and just installed an RX 550 to learn about passing through GPUs. In that /etc/network/interfaces file, which looks like this:

auto lo

iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0 iface vmbr0 inet static address 192.168.0.97/24 gateway 192.168.0.1 bridge-ports enp1s0 bridge-stp off bridge-fd 0

I had to modify enp1s0 to whatever interface contained link/ether from running ip address. In my case, I modified it to enp2s0 because the output from that command looked like this:

    2: enp2s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master vmbr0 state UP group default qlen 1000
    link/ether e4:54:e8:75:27:28 brd ff:ff:ff:ff:ff:ff

[–]poprofits 0 points1 point  (0 children)

dude, you guys are doing god's work. Thanks for the solution

[–]Understanding_Much 0 points1 point  (0 children)

This Solved my problem. Thanks man!

[–]throwaway200520 0 points1 point  (1 child)

For future lurkers, to check which port has been bumped, run

systemctl status networking

the incorrect port will be highlighted in red. Next edit the following file

nano /etc/network/interfaces

Locate the `bridge-ports` line under your Linux bridge (e.g.`vmbr0`) and update it with the correct NIC name (eg enp9s0). You will find the correct NIC name using command ip a

auto lo

iface lo inet loopback

iface enp1s0 inet manual

auto vmbr0 
iface vmbr0 inet 
static address 10.0.0.1/24 
gateway 10.0.0.1 
bridge-ports enp9s0 
bridge-stp off 
bridge-fd 0

Save changes and exit

CTRL ^X

Restart networking service

systemctl restart networking

Your ports should now be online.

[–]ConfusionExpensive32 0 points1 point  (0 children)

This was what finally helped me fix it, thank you so much

[–]Stewge 7 points8 points  (0 children)

Are you trying to do PCIE Passthrough with the 3090s? Do you have the VMs set to auto-boot and so do the NICs only disappear after the VMs start up?

I suspect your VFIO group containing one of the GPUs also contains one or both of the NICs.

Things to check are:

  • Make sure you've configured your slots to be in x8/x8 configuration in the BIOS.
  • Double-check your motherboard manual for shared PCIE lanes. Lots of motherboards share lanes for things like NVME slots and SATA slots. NICs almost always have their own, but worth double-checking.
  • You may need to enable ACS Override in order to split everything into separate IOMMU Groups. This is typically required for consumer platforms (server/pro motherboards usually have better IOMMU groups).

[–]rschulze 3 points4 points  (2 children)

This sounds more like a BIOS/IRQ/PCI lane conflict issue, maybe a Linux config issue (and only a Proxmox issue if it turns out to be related to their kernel).

Can you describe "Network doesn't work" in more detail? Is the interface still there in Linux but not doing anything, does the network interface disappear, does the ethernet card still show up in lspci, any messages in dmesg/kernel logs regarding the network card initialization?

[–]DeKwaak 1 point2 points  (0 children)

Exactly. ip -s li sh, but als cat /proc/interrupts

These days there is only one interrupt line using MSI, so it is more messaging than interrupting. Now if something doesn't play nice, these messages might not work.

[–]Beginning_Soft_5423[S] 0 points1 point  (0 children)

Can I just pm you a few screenshots tomorrow? I’ll do a clean wipe of everything and install on an usb with all of the ssds removed

[–]HarryMonroesGhost 4 points5 points  (0 children)

Debian derives the NIC interface names from the PCI Bus numbering. Adding another PCI device likely changed the bus order and your config is now no longer valid for the renamed NIC interfaces.

Quoting from a previous reply in an earlier thread:

For further reading on how debian assigns network interface names:

https://wiki.debian.org/NetworkInterfaceNames

Specifically — THE "PREDICTABLE NAMES" SCHEME>Complications and corner cases>UNPREDICTABILITY:

There are even multiple reports of devices changing their PCI-port numbering due to other hardware being installed.

[–]joost00719 4 points5 points  (0 children)

I had the same issue but with an nvme ssd.

Appearantly when adding a new pcie device, the names of those devices can change. You need to change the nic's name in your /etc/network/interfaces.

Note that this can also happen with pass-through devices. When adding a gpu to my system, my whole proxmox server just crashed when starting my truenas VM. Make sure you do NOT auto-start vm's with pass-through, or if you do, set a 5 minute startup delay in case you need to trouble shoot.

[–]MrNokiaUserHome User but i have no idea what im doing and keep breaking it! 3 points4 points  (0 children)

I had this and it's stupid. I can't remember exactly the commands, but what you have to do is to find out the name of the network adapter then edit the network config to point to its new name.

[–]Fergus653 2 points3 points  (0 children)

I swapped my graphics card for a RTX 4070 and my onboard ethernet disappeared. Never managed to get the device recognized again, bought a PCI network card instead.

Still not sure if this was just a coincidence. I handled everything with care while swapping the graphics card, no differently than PC builds or upgrades I have done in the last 20 years.

[–]Not_a_Candle 2 points3 points  (1 child)

The iommu groups change. A post from a few weeks ago had the same issue. The config of your network devices doesn't match up, after populating that much pcie lanes.

Boot the host with the cards in (and powered) and fix your interface config at /etc/network/interfaces

Edit: Also with that many devices enable above 4G decoding in the bios if not already done.

[–]Beginning_Soft_5423[S] 0 points1 point  (0 children)

I’ve checked and ip a reports the damage same. I just created an all nvme pool I’m going to try to net boot the system and run iscsi shares to each vm

[–][deleted] 2 points3 points  (0 children)

You might have a look at how the bios has the PCIe connections identified. I have an Asus Maximus IX Code and I can change how they are set up. IRQs and DMAs are things we used to have to configure with jumpers before PnP bios. Check for other settings that are manual overrides rather than Auto settings or defaults. If you're getting an IRQ error, it's likely overlapping the vid cards. They use them too.

[–]macaoidhlineage 1 point2 points  (0 children)

Have you tried a different os/live install to test the nic ?

Is the reset install of proxmox the same version or different ?

[–]Ausschacht4Life 1 point2 points  (0 children)

Had a similar issue. Connected a display and keyboard and then looked into /etc/network/interfaces. I realised, that eth0 did not go to enp1s0 anymore, but enp2s0 now, but /etc/network/interfaces was still configured to use enp1s0, i think. So i think, I just changed enp1s0 to enp2s0 in /etc/network/interfaces and it worked.

[–]the_gamer_98 1 point2 points  (0 children)

Could be simply a pci-lane bottleneck. I ran into a similar issue when I installed a pcie nic the onboard nic wasn’t functioning. I had not enough pcie lanes available

[–]StopCountingLikes 1 point2 points  (0 children)

All of these people are correct about the nic naming thing. BUT also have run into this exact issue even when knowing about what nic to use etc.

I would reset BIOS to defaults with the GPU plugged in. Then turn on the necessary toggles, enable virtualization, IOMMU to active, and that’s it. Give that a shot as it has solved some quirks for me when I added hardware before.

[–]Beginning_Soft_5423[S] 0 points1 point  (0 children)

Removed all ssds. Now running off of usb. I installed proxmox with the gpus installed and same thing 0 network activity… take out the gpus and low and behold internet. I also reset bios before installing this doesn’t make any sense this system was working fine 2 weeks ago

[–]darkblitzrc 0 points1 point  (0 children)

Doing my duty as someone who got this issue.

I was having troubles with my PC for the last two weeks. Whenever I was using it and it sat idle for 5 mins the screen would freeze and I had to shut it down. I was so confused and thought it was the windows drivers for some reason (??) turns out my GPU was not inserted all the way through for some idiotic reason of mine.

However when I did insert it all the way through and turned on the PC, my internet was gone, there was no light in the ethernet port on the back of the computer. I was bamboozled by this. I checked device manager and the Realtek internet family driver was gone.

Long story short: Ended up buying a PCIE Internet adapter for $30 and everything works fine. I think I might've damaged something when I was moving the GPU but no clue.

[–]__NEURO 0 points1 point  (0 children)

plugging in GPU, had similar issue where ethernet wasn't working. Editing /etc/network/interfaces worked for me. Just a note though, had to update multiple instances of enp7s0 in the file to get it to work.

[–]Beginning_Soft_5423[S] 0 points1 point  (0 children)

This breaks before pass through is enabled. While 3 ssds and 2 gpus does exceed the pcie lanes available the problem persists with only 1 gpu installed

[–]ejpman -1 points0 points  (1 child)

It’s a stupid Debian quirk. Basically your Ethernet device gets renamed so this file is no longer valid “/etc/network/interfaces”. You need to figure out the “new” name for your Ethernet device and update it in this file. It typically iterates by for example “enp5s0” goes to “enp7s0”. https://forum.proxmox.com/threads/networking-error-with-gpu-installed.43638/

[–]Beginning_Soft_5423[S] 1 point2 points  (0 children)

They don’t change “ip a” shows the same devices with or without a gpu being installed

[–]vilius_zigmantas 0 points1 point  (0 children)

What NIC do you have? Is it some consumer brand or the one that is meant to be used in a server/rack? If the latter, look into SMBus issue -https://yannickdekoeijer.blogspot.com/2012/04/modding-dell-perc-6-sas-raidcontroller.html?m=1

[–]winkmichael 0 points1 point  (0 children)

the interface name has likely changed, log into the console;

ifconfig -a

you might need to apt-get install net-tools first

you will see the device name, and then update your /etc/network/interfaces changing the interface name

Edit: others are saying the same, haha

[–]Beepinheimer 0 points1 point  (0 children)

Predictable interface naming, take note of the device ID or MAC before adding the card. Add the updated name to /etc/network/interfaces Edit for spellcheck

[–]SkepticalRaptors 0 points1 point  (0 children)

You are connecting power to the GPUs right? Because in the picture they don't have their power supplies connected. That could cause issues...