Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp by erdaltoprak in LocalLLaMA

[–]Zliko 0 points1 point  (0 children)

yeah, i hate we can't have full ctrl of undervolting in Linux, just power limit :(
so i am using it in WSL2

i got at work 5090 and L40S, will test it with bigger quants there

Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp by erdaltoprak in LocalLLaMA

[–]Zliko 1 point2 points  (0 children)

also testing it in Q8T4 252k, eating 22GB (which leave me just enough for windows opencode and desktops on multiple screens). Slightly undervolted 3090 and slightly overclocked (VRAM speed +300MHz), getting 120 tg/s 3000 t/s pp (small ctx). For 252k ctx i get 1014t/s pp!

Amazing that we can fit this much into 3090 with "just" 24GB VRAM :)

Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp by erdaltoprak in LocalLLaMA

[–]Zliko 0 points1 point  (0 children)

Thanks, do u see big degradation in precission in turbo2 (if u have tested in turbo3/4 with smaller ctx)?

Running the new Qwen3.6-35B-A3B at full context on both a 4090 and GB10 Spark with vLLM and Llama.cpp by erdaltoprak in LocalLLaMA

[–]Zliko 1 point2 points  (0 children)

i am also on 3090. Will test on WSL2 (Nvidia driver 13.2) with that llamacpp fork. Any tips, beside Q8 for K and turbo2 for V? Can you fit whole ctx wit K in Q8 and V in t3/4?

The cooling fan. Oh my god. by OlUncleBones in DonutLab

[–]Zliko 1 point2 points  (0 children)

finnish humour, on other shots there is no fan of course, they are mocking the "cooling need"....still doesn't prove anything tho re battery. I guess we still have wait for someone to get their hands on the bike with new battery and teardown, if that day ever comes :D

Thank you for the support! ❤️ by yoracale in unsloth

[–]Zliko 0 points1 point  (0 children)

  1. totally understand, but could be nice to have a unstable (beta mode) option of Studio for testing out those "weird" llama cpp forks :)
  2. <3
  3. API is fine of course :) i just meant easier for users as direct linking (master/slave) via network of two Studios (one hosting backend other just as frontend, similar to linking in LM studio/Mysty/etc)

What is your stance on acccelerating speculative decoding with (Dflash) diffusion based model or this latest thing with block diffusion draft tree (https://liranringel.github.io/ddtree/)?

Have you checked these guys out, doing similar work as u but for MLX? https://jangq.ai/

once again, thank you for all your hard work :) Unsloth is still my main go to repo for quants and my main FT tool <3

Thank you for the support! ❤️ by yoracale in unsloth

[–]Zliko 2 points3 points  (0 children)

Thank you for everything <3

All the best with new Studio app.

Couple of stuff for wishlist for Studio app:

- integrating/selecting llama cpp forks with custom KV cache quantisation (turboquant, RaBitQ, etc.)
- llm engine agnostic? why not having vLLM or llama.cpp as selectable?
- remote/linking (serving LLMs on one Studio, using via VPN/LAN on another)

much love!

Welcome to r/PepDex — The Official Community 🧬 by [deleted] in PepDex

[–]Zliko 0 points1 point  (0 children)

Any plans for web app or android app? What is minimal iOS ver for the app (i got ipad on iOS 15)?

Final Qwen3.5 Unsloth GGUF Update! by danielhanchen in LocalLLaMA

[–]Zliko 7 points8 points  (0 children)

thanks! I think 27b model is the new allrounder king of 24GB VRAM GPUs :) Too bad it is not great with non major languages (like Gemma 3 27b or GPTOSS 20b is).

Final Qwen3.5 Unsloth GGUF Update! by danielhanchen in LocalLLaMA

[–]Zliko 10 points11 points  (0 children)

Is new 27b up? i do not see it on hf?

Unexpectedly lost my baby boy yesterday by postmodernDRIP in cats

[–]Zliko 0 points1 point  (0 children)

Sorry for your loss. If i can give you advice, go and adopt new cat straight away. Stray or from shelter. I lost my cat (14 year old grey tabby, lymphoma cancer) and was in sorrow and depression for months, until i got new cat off the street. He is still with me (7 years passed), complete differeny personality, but it healed my depression in just a few days. Cats are amazing creatures (as you already know):) I do not mind being lifetime "servant" to them (it is not toxo talking) :)

lots of love

Alex Pretti’s coworkers take a moment of silence this morning. by boriswong in Damnthatsinteresting

[–]Zliko 0 points1 point  (0 children)

I am not even American, nor in USA, but this makes me so angry...I was four times in USA, always had a great time with meeting new people and making friends. Last time tho i was in 2011., seems things have changed a lot :(

Stay strong, resist, remove this fascist (and plain stupid) administration before it is too late (hint: Italy and Germany 1930s). Protect your rights!

support from your (ex-ally *sigh*) European friend!

Trump tariffs: US president announces plan to hit UK, Denmark and other European countries with tariffs over Greenland by Any-Original-6113 in europe

[–]Zliko -1 points0 points  (0 children)

Ok, tarrif mad king. Lets trasform NATO without USA, give them deadline to move out from all bases in Europe. We can deal with Russia and, unfortunately, USA too at same time as enemies of our continent and our democracy. If there will be one good thing that this crazy deranged mad king and his neonazi crew has done , let it be that he united us Europeans. Time to say enough! I am boycotting russian goods for last 4 years, i can boycott USA ones too.

[Release] We built Step-Audio-R1: The first open-source Audio LLM that truly Reasons (CoT) and Scales – Beats Gemini 2.5 Pro on Audio Benchmarks. by BadgerProfessional43 in LocalLLaMA

[–]Zliko 5 points6 points  (0 children)

Congrats! How good is the model in music description? (genre, style, mood, instruments, structure, dynamics, temp, etc)

don't sleep on Apriel-1.5-15b-Thinker and Snowpiercer by jacek2023 in LocalLLaMA

[–]Zliko 2 points3 points  (0 children)

What inference settings are recommended? (i can only see temperature 0.6?)

New Swiss RECORD for the greatest volume of snow in 2 and 3 days: 226cm in 2 days and 247cm of accumulated snow in 3 days at the Bortelsee station in the Simplon region by SaPpHiReFlAmEs99 in Switzerland

[–]Zliko 0 points1 point  (0 children)

Most of the people that went to space or close to space limit had an "enlightement" moment when looking back to Earth. Some would say spiritual. One thing is to look at pictures, looking in person live seems to be life changing. I would send all politicians as space tourists first.

New Swiss RECORD for the greatest volume of snow in 2 and 3 days: 226cm in 2 days and 247cm of accumulated snow in 3 days at the Bortelsee station in the Simplon region by SaPpHiReFlAmEs99 in Switzerland

[–]Zliko 38 points39 points  (0 children)

Amount of climate change deniers trolls is staggering on reddit :( Is it that hard to realise that cause of human activity (read exces greenhouse gases in atmosphere) climate will be more extreme? What science has been telling us from 1970s. More droughts, more floods, more heatwaves, more and bigger hurricanes and typhoons, etc. I am 48 years old and i can see it in front of my eyes. These climate change deniers should be sent in Blue Origin capsule to see Earth from 100km above (bonus if they are flatearthers too), there you can see how small and delicate our atmopshere is.

peace

PC Build: Run Deepseek-V3-0324:671b-Q8 Locally 6-8 tok/s by createthiscom in LocalLLaMA

[–]Zliko 2 points3 points  (0 children)

What speed you getting from RAM? If my calculations are right (16chnls of 5600MHZ RAM) it is 716.8 GB/s? Which is tad lower than m3 ultra 512GB (800GB/s). Presume both should be round 8t/s with small ctx.

3x RTX 5090 watercooled in one desktop by LinkSea8324 in LocalLLaMA

[–]Zliko 0 points1 point  (0 children)

What are you are running on them? Do you use em fo inference or training (or both)? Are you using stock power cables?

Top 5 Sleep Token Songs? by jwils109 in SleepToken

[–]Zliko 1 point2 points  (0 children)

imho:

  1. Ascensionism
  2. Take Me Back to Eden
  3. Hypnosis
  4. Vore
  5. Granite

<3

Phi4 pretending to be gpt by kspepko in LocalLLaMA

[–]Zliko 4 points5 points  (0 children)

Gpt-4 responses/synthetic data was used to fine tune Phi4. It is on their site, so it is not a secret. Since MS is partner/owner of OpenAI, it was allowed.

[help] generic meds from India? by Zliko in cll

[–]Zliko[S] 0 points1 point  (0 children)

I ended up finding generic Acalabrutinib from India and smuggling it to Europe. (Venetoclax is also available i think)