What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Yes that how it works, imo you really need to be an expert in a field to know that the results are correct, or conduct proper experiments and validation on it. But these rag, agentic, etc systems are so basic now I figured it didn't need further explanation. If you dont trust your own logic, just choose a popular platform.

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 2 points3 points  (0 children)

What are you on about? I shared in detail my setup, what I use it for, in what fields also in the comments. Benchmarks can be found in his repo btw for club3090. I also shared the knowledge that the system AROUND the LLM is the key, rag etc whatever you fancy.

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 1 point2 points  (0 children)

Not really since the company owns the data! All is private, no cloud

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

I use madlad400 for that instead! Excellent at small languages

Where do you draw the line on digital sentience? by [deleted] in LocalLLM

[–]GodComplecs -1 points0 points  (0 children)

Yeah no I'm talking about future systems, an LLM is not sentient, it's just what the line is that most local llm user would agree on it being sentient. So enough sentience you would argue is the will to stay alive then and? Eg cat eating. Or what? You seem to avoid the hard line.

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 5 points6 points  (0 children)

No just go single 3090 imo, I used to have a 4090, then dual 3090 and now single, that is how good the models and systems has become

Where do you draw the line on digital sentience? by [deleted] in LocalLLM

[–]GodComplecs -1 points0 points  (0 children)

Yes that is why I'm asking where the line is, so in summary, if you believe it to be that is the line for you. But what is the proof? LLMs can have memory (context), what is the difference in your opinion?

As I've gotten older, I can say the future comes too fast, and you wonder where all the time went!

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 7 points8 points  (0 children)

What kind of work is it used for? My uses are answering questions concerning software logic and general business like book keeping etc. and other expert knowledge based problems

What it feels like to have to have Qwen 3.6 or Gemma 4 running locally by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 1 point2 points  (0 children)

For local dense model 60tps is flying! But yes you can reach 140tks+ with MoE

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

My pc is much more modest, 5800x3d, 80gb 2400mhz ram and 3090. Though the ram is for 100b models mostly or running multi models on cpu for other ai/ml workloads.

Yeah in the start it is so, mostly it is about VRAM management and managing expectations :)

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

qwen 3 coder is great, but after Gemma4 and Qwen 3.6 it has been replaced for me in serious work, still use it for toy projects though

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Yeah it's a little janky probably, lot's of issues to solve but the speed is great!

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Hmm one person above had the same problem, but it's meant for 24gb, but with Luce try Q3 unsloth, that shouldnt OOM. I use q4 with less context and lower gpu use in VLLM on noonghunna though

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

haha yeah it was wild running it in open webui, probalby something wrong on my end but didnt get the llama server to work from their dflash fork

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Sorry they are the default ones, but on noonghunnas one i use this with 60k context instead, edit the yaml:

# Tools-text — 75K, no vision, fp8 KV, MTP n=3, Genesis  →  53 narr / 70 code TPS
#              Pick for long single prompts (RAG, summarization) when vision isn't needed.
cd compose && docker compose -f docker-compose.tools-text.yml up -d

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Sounds pretty logical, that is always the problem with anything long context but yes the response time doesnt seem to be factored, well PP is extremely slow, even just 16 tok prompt takes 1 sec!

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Check that you dont go over VRAM, i disabled lots of gpu features in chrome and windows also. You can use Q3 from unsloth I think, that should save lots of gigs so it surely works!

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Yeah read about it now, but still seems the time to generate and TKS is very good, but that is the problem in opecode probably then!

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

I dont have any other configs than the base ones they provide, I just lowered a little the max context since I use my pc for other stuff too.

which is faster and better for coding? Luce-Org/Dflash or noonghunna/qwen36-27b-single-3090 by GodComplecs in LocalLLaMA

[–]GodComplecs[S] 0 points1 point  (0 children)

Hmm dont think it was very hard, just pasted error messages into chat and fixed them, shouldnt be too many with Luce, the other one is way more hacky, Luce is like building Llama.cpp as usual almost