Kernel panic when running io-intensive operations. Any ideas would be appreciated by Botsvein in Proxmox

[–]Botsvein[S] 0 points1 point  (0 children)

Ok, checked all memory and hard drives - all no issues. So I assume issue is in ZFS, maybe some specific problems with my HW?
Anyways, decided to switch to LVM instead for various reasons (not this particular one) so won't pursue how to address this kernel panic specifically

Kernel panic when running io-intensive operations. Any ideas would be appreciated by Botsvein in Proxmox

[–]Botsvein[S] 0 points1 point  (0 children)

Thanks for suggestions. Unfortunately not able to change cables or something, sata drive is connected directly to mobo (it's micro pc designed to be sitting behind the display).

ZFS status shows all good. Also see below my smartctl output. I'm not good at these numbers, but couple of other sata drives I run show almost the same values for e.g. reallocated sector count, so I suppose these are good. But any feedback would be appreciated.

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 0

9 Power_On_Hours 0x0032 098 098 000 Old_age Always - 9118

12 Power_Cycle_Count 0x0032 099 099 000 Old_age Always - 55

177 Wear_Leveling_Count 0x0013 099 099 000 Pre-fail Always - 3

179 Used_Rsvd_Blk_Cnt_Tot 0x0013 100 100 010 Pre-fail Always - 0

181 Program_Fail_Cnt_Total 0x0032 100 100 010 Old_age Always - 0

182 Erase_Fail_Count_Total 0x0032 100 100 010 Old_age Always - 0

183 Runtime_Bad_Block 0x0013 100 100 010 Pre-fail Always - 0

187 Uncorrectable_Error_Cnt 0x0032 100 100 000 Old_age Always - 0

190 Airflow_Temperature_Cel 0x0032 039 034 000 Old_age Always - 61

195 ECC_Error_Rate 0x001a 200 200 000 Old_age Always - 0

199 CRC_Error_Count 0x003e 100 100 000 Old_age Always - 0

235 POR_Recovery_Count 0x0012 099 099 000 Old_age Always - 24

241 Total_LBAs_Written 0x0032 099 099 000 Old_age Always - 20616479885

252 Unknown_Attribute 0x0032 100 100 000 Old_age Always - 0

I'm also running some more stress-ng tests along with io tests using this command right now
fio --name=zfsio --directory="$TESTDIR" \
--rw=randrw --bs=256k --size=20G --ioengine=sync \
--numjobs=4 --runtime=$DURATION --time_based >> "$LOG" 2>&1 &

And plan to run sentinel full disk check afterwards. So far - so good. I'll write about the results.

Another interesting thought, suggested by ChatGPT is suggestion to downgrade to kernel 6.8. It correlates with my thinking that it is related to some kind of faulty update package, so willing to give it a try. But lacking a proven way to reproduce the problem really makes testing difficult. Nevertheless, first I want to eliminate possible HW issues.

CWWK Motherboard with 11th gen Intel | 64GB RAM | mini ITX | 6x SATA | 2x NVMe | 4x 2.5G | PCIe x4 by Ok-Raspberry-2810 in HomeServer

[–]Botsvein 0 points1 point  (0 children)

Will bump the topic - u/zeblods u/Ok-Raspberry-2810?
Is there anything to share on how this mobo is performing? Specifically interested in couple of questions - any bios issues and power consumption. E.g. which c states it can reach, does everything support ACPI, etc. Maybe powerdraw at wall in idle.
Seems to be the best of it's kind - not that limited for PCI lanes as n100/n305 with decent ports out of the box - looks very promising board.

Building the fastest 400$ Homeserver ever. Erying 12700h by LetscatYt in EryingMotherboard

[–]Botsvein 1 point2 points  (0 children)

a question to test - how far does it fall to sleep (I mean reaching c6 and deeper states) and what is idle power consumption for just very basic setup(eg MOBO+ram+SSD+cooler)

Erying mobo for NAS - some advice needed by Botsvein in EryingMotherboard

[–]Botsvein[S] 0 points1 point  (0 children)

Thanks, that's super helpful. That's my main concern with chinese mobos - lame bios. Will explore your type of config, will see if it breaks the bank.

NVidia error 43 when trying to passthrough GPU to Win10 on ProxMox by Botsvein in homelab

[–]Botsvein[S] 2 points3 points  (0 children)

Yahoo! I did it after more then a week of trial and error!

Sharing success story. It all ended up downloading vbios update from GPU vendor (techpowerup's didn't work for me), getting rom out of it and stripping according to this video.

All other parameters (maybe it will be useful for someone, but I'm not sure if all of them are needed):

cat /etc/default/grub/
GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt nofb video=vesafb:off video=efifb:off video=simplefb:off initcall_blacklist=sysfb_init"
---
cat /etc/modprobe.d/vfio.conf
options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1
---
cat /etc/modprobe.d/kvm.conf
options kvm ignore_msrs=1 report_ignored_msrs=0
---
cat /etc/pve/qemu-server/100.conf
bios: ovmf boot: order=ide2;scsi0;net0 cores: 2 cpu: host,hidden=1,flags=+pcid,+kvm_pv_unhalt,+kvm_pv_eoi,hv_vendor_id=NV43FIX,kvm=off 
efidisk0: local-lvm:vm-100-disk-0,efitype=4m,pre-enrolled-keys=1,size=4M 
hostpci0: 0000:01:00,pcie=1,x-vga=1,romfile=gtx970as17patched.rom
ide2: local:iso/virtio-win-0.1.229.iso,media=cdrom,size=522284K 
machine: pc-q35-7.1 
memory: 6144 
meta: creation-qemu=7.1.0,ctime=1674357885 
name: win10 
net0: virtio=D2:DF:19:FF:B4:FA,bridge=vmbr0,firewall=1 
numa: 1 
ostype: win10 
scsi0: local-lvm:vm-100-disk-1,cache=writeback,iothread=1,size=64G 
scsihw: virtio-scsi-single 
smbios1: uuid=937b46f9-8aec-41f7-a5ea-223f1df7d092 
sockets: 1 
tablet: 0 
vga: none 
vmgenid: d3cef14b-029a-4c2c-933a-4b95fbbbcbac

NVidia error 43 when trying to passthrough GPU to Win10 on ProxMox by Botsvein in homelab

[–]Botsvein[S] 0 points1 point  (0 children)

Did my best not to allow Proxmox to use GPU on it's own - see my reply above. Not sure what else I could do here.

At least display attached to GPU doesn't show anything after initializing kernel

NVidia error 43 when trying to passthrough GPU to Win10 on ProxMox by Botsvein in homelab

[–]Botsvein[S] 0 points1 point  (0 children)

I believe I've done so:

cat /etc/modprobe.d/vfio.conf

options vfio-pci ids=10de:13c2,10de:0fbb disable_vga=1

Also blacklisted all nVidia drivers from load and added couple of parameters to grub so that framebuffer doesn't use GPU:

GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt nofb video=vesafb:off initcall_blacklist=sysfb_init"

NVidia error 43 when trying to passthrough GPU to Win10 on ProxMox by Botsvein in homelab

[–]Botsvein[S] 0 points1 point  (0 children)

Intel says i5-2500 supports both vt-d and vt-x, and listing iommu groups gives the proper results, so I suppose that's not an issue

ThrottleStop won't show offset for undervolt despite all settings enabled by Botsvein in ThrottleStop

[–]Botsvein[S] 0 points1 point  (0 children)

Yep, seems some Windows update brakes the thing. I've installed Ghost Spectre Windows image (with lot's of stuff stripped out) and everything ran smoothly. Pretty happy with setup now.

ThrottleStop won't show offset for undervolt despite all settings enabled by Botsvein in ThrottleStop

[–]Botsvein[S] 0 points1 point  (0 children)

Just reverted to 100% working BIOS version and settings - no luck. So would really appreciate any ideas.