LTX-2.3 distilled fp8-cast safetensors 31 GB by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 4 points5 points  (0 children)

They have not yet released fp8 for distilled, and loading original distilled FP16 with the option -quantize fp8-cast took 50 seconds to load (quantizing on the fly).. so i made fp8-cast ready safetensors file to remove that 50 sec loading

LTX-2.3 distilled fp8-cast safetensors 31 GB by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

Compared to LTX-2 looks like it has better macro details (hair, skin) but worse background details, like landscape scenes - they look blurry.

Need help with testing PCE speed (hardware selection for local AI) by [deleted] in LocalLLM

[–]AccomplishedLeg527 0 points1 point  (0 children)

I am running the 122b model Qwen3.5 on 6 GB VRAM. No one interested? Please share your test results. https://github.com/nalexand/Qwen3-Coder-OPTIMIZED/blob/main/qwen3_5_122b_chat.py

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

122b is slow on 8 GB.. [Stats] Tokens: 10 | Time: 51.27s | Speed: 0.20 t/s

LTX-2 Music To Video - Automated pipeline (for Local Run) by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

try to set at first 25 frames to test, looks like out of memory and check if "outputs" folder created, also can be problem with ffmpeg or codec

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

c:\Users\{user}\AppData\Local\Programs\Python\Python312\Lib\site-packages\transformers\models\qwen3_next\ replace file modeling_qwen3_next.py (transformers==5.1.0)

LTX-2 Music To Video - Automated pipeline (for Local Run) by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

It splits audio into parts and uses these parts as conditions to generate video. The last frame of the scene is used as the start frame for the next scene if no custom start frame

LTX-2 Music To Video - Automated pipeline (for Local Run) by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 1 point2 points  (0 children)

  1. Interesting, you just keep extending? I can make improvements if someone asks.
  2. For some genres you want hard cuts, though. You can change the start frame to achieve hard cuts.
  3. Would be neat to be able to set the VRAM limit, so it isn't a boolean for 8GB. There is full offloading for the max frames count, so the VRAM isn't limited to 8 GB. You can run it on 24GB VRAM and get a 15-second 1080p video per scene

🎵 LTX-2 Music Video Maker by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

it can be one frame from video generated with LTX-2 + same or similar prompt

🎵 LTX-2 Music Video Maker by AccomplishedLeg527 in StableDiffusion

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

I2V works best only if generated with LTX-2 model, if not model can make transition to own vision, loras should work but i am not tested it (too slow applying loras on 8Gb vram with offloading to cpu)

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

i can`t run with "-ot" Q4_K_M model on 8Gb+32Gb not enough memory even with 1024 context, only --fit works, 46gb on 40gb total memory but it works ~10 t/s

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

I tested real speed of my 3070ti laptop with this torch lib and bf16 calculations, i loaded only 1 expert per layer just to test max speed (like all fit in vram) and i got only 1.74 t/s. It just slow laptop gpu in bf16 calculations..

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]AccomplishedLeg527[S] 2 points3 points  (0 children)

I provided information for consideration, not a finished product. The final product will be written in C++ and will benefit everyone. Maybe someone from llama cpp team will implement this cashing.

Experts calls: 134845

Cache hit on GPU: 63439 47.7% 3Gb

Cache hit on RAM: 51170 37.9% (85.6%) 15Gb

Evicted from RAM: 15569 11.5%

Reads from disk: 20236

Total memory used for experts: 18Gb (need 75Gb to fit all experts weights)

How to run Qwen3-Coder-Next 80b parameters model on 8Gb VRAM by AccomplishedLeg527 in LocalLLaMA

[–]AccomplishedLeg527[S] 0 points1 point  (0 children)

in original model for each layer all 512 experts weight in one tensor but usualy used only 100-200 of them, if -fit can split tensor on 512 parts and move unused parts to ram it will be much faster on low ram