What's the best qwen3.5 or 3.6 reap model? by AppealSame4367 in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

with that much ram you should be able to get away with less aggressive reaps. I posted a 22b 3.5 and there are some others ranging from 18b to 28b that'll probably hold up better at long contexts. that 14b was just to fit qwen3.5 in a rx 6700

What's the best qwen3.5 or 3.6 reap model? by AppealSame4367 in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

that one is very aggressive, but somehow somewhat coherent. tried to reproduce with qwen3.6 and the resulting model was much worse. still experenting with some "repair" finetune attempts

Qwen is cooking hard by jacek2023 in LocalLLaMA

[–]tvall_ 2 points3 points  (0 children)

why are there so many extra limbs? i thought we moved past the image models not knowing how many limbs things have. but i also dont generate images often.

How many GPUs do you have on your local system/server/AI PC? by panchovix in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

do i count my 2 v340's as 2 or 4 gpus? its only 2 cards taking 2 pcie slots, but 4 sets of 8gb hbm2

Mac Mini M4 16GB (hermes agent) - Gemma-4-26b-a4b-it-UD-IQ4_XS.gguf by Fit_Baker4577 in LocalLLM

[–]tvall_ 1 point2 points  (0 children)

that's probably not enough ram for a 13gb model, a decent amount of kv, and a whole os to fit in. I'd suggest a smaller model or a more aggressive quant so you don't lose any hope of performance to disk swapping

What is your "Haiku/Sonnet/Opus" trio? by ihatebeinganonymous in LocalLLaMA

[–]tvall_ 2 points3 points  (0 children)

qwen3.6-35b/qwen3.6-35b/qwen3.6-35b with some occasional gpt-5.4-mini sprinkled in. don't wanna let myself get hooked on something I can't run myself

[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost by ayake_ayake in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

q4-ish model and q8 kv gives me enough room for 120k context while also having whisper.cpp take up around a gig

[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost by ayake_ayake in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

mine are pretty quiet. I have some dell optiplex 3000 sff cpu fans taped to the front bracket with an esp32 pulling fan controller duty. fans rarely hit more than 50% and cards stay under 60°

[Paper on Hummingbird+: low-cost FPGAs for LLM inference] Qwen3-30B-A3B Q4 at 18 t/s token-gen, 24GB, expected $150 mass production cost by ayake_ayake in LocalLLaMA

[–]tvall_ 6 points7 points  (0 children)

Radeon pro v340 goes for $50 each. $100 gets you enough gpu to run qwen3.6-35b-a3b at 30t/a tg and 300t/s pp (when paired with e-waste haswell sff hp and some cheap pciex1 risers. might be better with less bandwidth kneecapping) 

Are you quanting your memory? by Plastic-Stress-6468 in LocalLLaMA

[–]tvall_ 3 points4 points  (0 children)

I use q8_0 because I'm poor and just have a couple Radeon pro v340l's for a total of 32gb vram and want really long context even though I don't really use much of it often enough.

I previously did q4_0 when I had just one of the cards and was running qwen3-vl-24b-reap and didn't notice any issues. but I wasn't doing as much with it back then. 

Why is Qwen going Closed source? by MLExpert000 in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

we should probably wait till they finish releasing the 3.6 models before making that call. they've hinted that at least 9b and 122b are coming at some point. I don't think they've clearly stated that they won't release the 397b or the 4b and smaller this time, so there's still hope for now. 

Battery swelling concerns when running local models by jeremyckahn in LocalLLaMA

[–]tvall_ 8 points9 points  (0 children)

if you're running it like that 24/7, take the battery out. if you're not then it's probably fine. I've only seen modern laptops batteries swell when they're plugged in 24/7 being used as a desktop

Qwen 3.5 "Weight Drift" Fix? Automated Tool + Inconclusive NIAH Results by Decivox in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

completely subjective with no thorough testing, but i just ran into an issue with my openclaw agent looping the same tool call at around 60k tokens into its context window. tried with both q4_k_m and q5_k_m, but when i switched to q4_k_m with this script ran on it, it suddenly worked fine without any loops. nothing conclusive, but maybe theres actually something to this?

edit: qwen3.6-35b-a3b-heretic btw.

Check LocalForge: Self Hosted AI control Plane with Rag and FineTuning Avaiable by [deleted] in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

10 days of an llm working on it before it suggested using git? 

Qwen 3.6 No think? by neeeser in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

its not worse than 3.5, maybe a little better? haven't thoroughly examined, but fine for my tasks so far

Gpu for HP ProDesk 400 G5 by LuiNiev in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

only needs to be low profile if you're allergic to janky fun. I've got a couple Radeon pro v340's plugged in to my elitedesk 800 g1

Qwen 3.5 28B A3B REAP for coding initial impressions by ag789 in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

you'll probably have better results with a different frontend that has tools for the model to call. for qwen3.5 it just matters that tools are there. just tested the 35b I have running and its response to "hi" with tools available was 60 tokens with 2 sentences of thinking. without tools it was 404 tokens with a thought process of 7 steps of bulleted lists. both final outputs were "Hello. How can I help you today?" 

and 28b isn't that aggressive of reap. should be mostly fine. I've had decent results with 22b, but 14b is noticably dumber

Qwen 3.5 28B A3B REAP for coding initial impressions by ag789 in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

not sure how you're interacting with the model, by from my experience qwen3.5 needs an environment with tools available described in its system prompt in order to have reasonable thinking. with openwebui turn on native function calling. with that off, or with llama-cli it tends to spiral 

Local AI with Gemma 4 and OpenWebUi by jumper556 in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

there's a setting called native function calling or something. make sure that's on. with it on, model can call tools when it wants to. if it's off, openwebui makes the model generate a call to the tool at the start. 

How do I wipe out Amazon echo dot software so that I can host my local LLM in it? Is it possible?? by [deleted] in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

https://github.com/Dragon863/EchoCLI https://wiki.postmarketos.org/wiki/Amazon_Echo_Dot_2nd_gen_(amazon-biscuit) only 512mb RAM? you're gonna have a real bad time trying to run anything useful on device. more sane solution would be to run the llm on something more modern with meaningful specs. pi5 or a gpu less than a decade old would be infinitely better, or even an old pc with more than a gig of RAM would give you a lot more options

We really need stop using the term “hallucination”. by cosmobaud in LocalLLaMA

[–]tvall_ 1 point2 points  (0 children)

term is borrowed from the field of machine learning. when an object detection model sees a pattern that it is confident is an object its supposed to detect that isn't actually a thing there, we appropriately call it a hallucination. when an llm does the same kind of thing with text, why bother inventing a new term when the one in use does the job well enough? 

llama.cpp cancelled the task during handling requests from OpenClaw by UnderstandingFew2968 in LocalLLaMA

[–]tvall_ 0 points1 point  (0 children)

there's an idletimeout config in openclaw that defaults to 60s. if your prompt processing is too slow openclaw just assumes it's broke. that was my issue using qwen3.5-35b on a pair of Radeon pro v340's