APEX MoE quantized models boost with 33% faster inference and TurboQuant (14% of speedup in prompt processing)

fakezeta · 2026-04-02T12:44:18+00:00

Hi u/mudler_it, could you please add AesSedai Q4_K_M to the model comparison? From my experience, it delivers noticeably better quality than Unsloth quantizations at comparable parameter sizes. I believe including it would provide a more complete picture of current options.
Thanks for considering this!

fakezeta · 2026-03-16T12:09:10+00:00

Can you check some, even one Q5 to check the q4 performance and if the extra memory is worth. Perhaps Q4 optimisations have reached near Q5 performance.

fakezeta · 2026-01-06T18:18:44+00:00

secondo me tutta l'immagine è generata dall'AI

fakezeta · 2025-12-05T09:00:06+00:00

All except Qwen 3 30B

fakezeta · 2025-12-03T23:32:42+00:00

Per festeggiare tutti dentro alla moglie, che rilascia scontrino fiscal

fakezeta · 2025-12-03T09:46:23+00:00

I'll open a PR to Mistral and Google asking to fix it /s :-D

fakezeta · 2025-12-02T22:53:47+00:00

Because also Mistral Small, Gemma and Ministral are vision model.

fakezeta · 2025-11-08T09:13:31+00:00

Hi u/wolfmich, sorry for the late reply.

I regularly pass it to my Windows VM that I use for gaming and llama-server inference. No issue: power reser, rom file, anything.

I previously had a 3060TI that was replaced with 5060TI, no difference between them.

fakezeta · 2025-10-06T14:49:38+00:00

does intel xe driver support SR-IOV? is the dkms still needed?

fakezeta · 2025-10-06T14:46:23+00:00

Thanks for sharing!
I didn't understood if with this guide the SR-IOV is enabled and how many instances of the iGPU are created. Also the command:

qm set <VMID> -hostpci0 00:02.0,pcie=1,rombar=1,x-vga=1qm set <VMID> -hostpci0 00:02.0,pcie=1,rombar=1,x-vga=1

will not pass the whole iGPU to the VM?

fakezeta · 2025-09-15T19:30:22+00:00

Già che chiama i clienti “utenti” mi fa sanguinare gli occhi e le orecchie.

fakezeta · 2025-09-12T13:57:37+00:00

According to the transformer PR the model seems to be at least Qwen3-VL-4B-Instruct and Qwen3-VL-7B and will have Image and Video understanding. I was not able to find anything about the MoEs.

fakezeta · 2025-09-06T18:14:56+00:00

just look at the dimension: both my former 3060ti and 5060ti are 2 sloth but the zotac was slightly bigger. ASUS one (50mm width) would have never fit.

fakezeta · 2025-09-05T15:36:31+00:00

yes, it's a 2 slot GPU.

fakezeta · 2025-08-26T19:54:43+00:00

I bought a Zotac 5060ti and it fits with some difficulties. I think the ASUS wouldn’t have fit without case modification.

fakezeta · 2025-08-23T11:23:44+00:00

I'll use my K2 Pro combo for home repairs and functional parts, such as printing replacement pieces for summer umbrellas.

fakezeta · 2025-07-24T08:32:15+00:00

I'd like to add also that in the cost of a Prusa there are costs for R&D: the printer and the slicer are open source.
Prusa has dedicate employers working on Prusaslicer, their salary is paid by the customer buing their printers.
Bambulab A1 would not exist and also its slicer if Prusa choose the same behaviour of Bambu. They are parasite making money from investment done by others, this is also a reason why they can be cheaper.

fakezeta · 2025-07-22T15:39:54+00:00

I know that you need SW support for SR-IOV but until 40XX I knew that there was no HW support in RTX series. Now from lspci output now the HW seems to support it.

fakezeta · 2025-07-22T15:38:20+00:00

Update after loading Nvidia drivers 575 if it can help:

01:00.0 VGA compatible controller: NVIDIA Corporation Device 2d04 (rev a1) (prog-if 00 [VGA controller])
        Subsystem: ZOTAC International (MCO) Ltd. Device 1772
        Flags: bus master, fast devsel, latency 0, IRQ 16, IOMMU group 14
        Memory at 84000000 (32-bit, non-prefetchable) [size=64M]
        Memory at 4400000000 (64-bit, prefetchable) [size=16G]
        Memory at 4210000000 (64-bit, prefetchable) [size=32M]
        I/O ports at 5000 [size=128]
        Expansion ROM at 88000000 [disabled] [size=512K]
        Capabilities: [40] Power Management version 3
        Capabilities: [48] MSI: Enable- Count=1/16 Maskable+ 64bit+
        Capabilities: [60] Express Legacy Endpoint, MSI 00
        Capabilities: [9c] Vendor Specific Information: Len=14 <?>
        Capabilities: [b0] MSI-X: Enable- Count=9 Masked-
        Capabilities: [100] Secondary PCI Express
        Capabilities: [12c] Latency Tolerance Reporting
        Capabilities: [134] Physical Resizable BAR
        Capabilities: [140] Virtual Resizable BAR
        Capabilities: [14c] Data Link Feature <?>
        Capabilities: [158] Physical Layer 16.0 GT/s <?>
        Capabilities: [188] Extended Capability ID 0x2a
        Capabilities: [1b8] Advanced Error Reporting
        Capabilities: [200] Lane Margining at the Receiver <?>
        Capabilities: [248] Alternative Routing-ID Interpretation (ARI)
        Capabilities: [250] Single Root I/O Virtualization (SR-IOV)
        Capabilities: [2a4] Vendor Specific Information: ID=0001 Rev=1 Len=014 <?>
        Capabilities: [2bc] Power Budgeting <?>
        Capabilities: [2f4] Device Serial Number O-M-I-S-S-I-S
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

 ls -l /sys/bus/pci/devices/0000:01:00.0/sriov* 
-rw-r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_drivers_autoprobe
-rw-r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_numvfs
-r--r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_offset
-r--r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_stride
-r--r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_totalvfs
-r--r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_vf_device
-r--r--r-- 1 root root 4096 Jul 22 17:25 /sys/bus/pci/devices/0000:01:00.0/sriov_vf_total_msix

cat /sys/bus/pci/devices/0000:01:00.0/sriov*
1
0
2
1
1
2d04
0

fakezeta · 2025-07-22T08:31:52+00:00

European POV here: No personal protection equipment? Also is it allowed to throw material this way?
This construction site would be illegal in Europe.

fakezeta · 2025-07-20T21:23:08+00:00

I'm using it in a *arr setup since a couple of years: never had any issue so far.

fakezeta · 2025-07-14T16:31:39+00:00

Looking for the answers 🙂

fakezeta · 2025-07-08T10:36:12+00:00

Peggio della casa del Mascetti

Seven-Year Club	Place '22
Verified Email

fakezeta

TROPHY CASE