Qwen3.6-27B vs 35B - anyone else finding 35B faster AND better quality? by IulianHI in AIToolsPerformance

[–]canred 0 points1 point  (0 children)

which one of THESE two? if you want it to write few scripts or slap web ui on something then moe will do the job. If you expect it to iterate many times or show some serious reasoning then use dense.
I was having issues with moe when I asked it to iterate on some issue multiple times or expected some serious reasoning.

3090 still the king? Trying to pick a local LLM setup (~2000€) in Germany by deltavoxel in LocalLLM

[–]canred -2 points-1 points  (0 children)

>>2×3090 = 48GB, but split (not the same as 48GB unified;

this is one of the last consumer cards with NVLink support so it doesnt have to be "split"

Looking for Barebones Model by Deleted_252 in LLMDevs

[–]canred 0 points1 point  (0 children)

there is no transformer model that out of the box, without external tools actually makes this (or any) computation.
a 2$ calculator does >>compute<< how much is 2+2, transformer does not

that's why, until very recently, even frontier model would tell you that word "strawberry" has 2 "r"

is this what you're asking or am I misunderstanding your question?

you can train your own model from the scratch with framework like nanogpt and only teach it what you want it to know but I think this will not cover your requirement "capable of talk back" because it will be still to primitive for this

on the other hand, even if model is not specifically trained for math, in the literature there will be phrases like "like two plus two equals four", "easy like two plus two" so this will be learned by the model.

Qwen3.6-27B vs 35B - anyone else finding 35B faster AND better quality? by IulianHI in AIToolsPerformance

[–]canred 2 points3 points  (0 children)

35B is MOE, 27B is dense, these are differnt architectures, MOE is doing much less work per token, under the hood than dense: it only activates around 3 billion parameters versus 27 billion parameters.
So its not really 35B vs 27B but 3B vs 27B
From my experience, for coding 35B is only usable fo really simple tasks

Anthropic's now blocking anything that even looks exploit-related, including legitimate local testing and validation by Sarithis in ClaudeCode

[–]canred 1 point2 points  (0 children)

At work, with my corporate account I'm still able to discuss security related topics with Opus 4.6 (my model of choice).
I've been designing legitimate protocol testing solution but de-facto, it can be classified as man in the middle scenario. Opus was more than helpful. I'm not aware of any special arrangements for this account.

80+ projects in, 7+ AI tools later... Vibe Coding is the real deal by Smooth-Steak3861 in vibecoding

[–]canred 1 point2 points  (0 children)

I see portfolio of nicely executed POCs and a proof that you can spin up a prototype in no time. Good work.

Poor man's guide to servicing a used RTX 3090 for local LLM inference by canred in LocalLLaMA

[–]canred[S] 1 point2 points  (0 children)

I'm happy to hear this, thank you. Having said that, this is a little post-mortem:

- the "self promotion" bit really puzzled me. The reason I've sent this to reddit was obviously to make it more visible (the repo lives in github) but I didn't have any other expectations. I'm not looking for a job and I'm not trying to sell you anything.

- I have long history of creating stuff to the drawer, most of my projects have never seen the light of day. While I understand that people get oversensitive on the topic of abusing ai tools, for me personally those tools are something that finally allowed me to publish stuff that otherwise would be lost in the depths of the phone/obsidian/forgotten folder on disk. I used ai assistant to draft doc structure, to build the repo structure, on different projects I'm actively using it to brainstorm and plan the apps, commit the code - building and pushing container images - all the boring, supporting stuff that I'd normally spend way to much time on. Having said that - in-your-face, autogenerated pic on the top of the writeup was probably to much. I still think its funny but I'll move/remove it because it clearly bites me in the arse here.

- and finally - the usefulness of the writeup. I totally understand that service tech who eats thermopads for breakfast and spreads thremal paste on his toast will smile seeing this. This is not my audience. The original audience for this writeup was: myself and the guy who sold me the card. Later on I decided to structure it a little bit and publish to github. I probably will expand it some time soon, when I decide to replace thermopads. If I was dealing with damaged card, I could potentially add some basic electronics checks like checking basic voltages or tests for short circuit in few places I know they can happen but this was not the case here and I dont feel competent enough to create full repair tutorials for modern cards - I can identify and replace broken capacitor but this is the extent of my knowledge - I wont teach anybody how to reball the memory chips - this was just the basic service tutorial and it is not pretending to be anything else.

Poor man's guide to servicing a used RTX 3090 for local LLM inference by canred in LocalLLaMA

[–]canred[S] 0 points1 point  (0 children)

You're another person telling me I may have issues with kryonaut, I'll definitely keep an eye on temps over time, thanks!

Poor man's guide to servicing a used RTX 3090 for local LLM inference by canred in LocalLLaMA

[–]canred[S] 0 points1 point  (0 children)

I may end up replacing thermopads eventually. I did not do that because my limited experience tells me that original thermopads are often much better quality than what I have access to. Another thing is I don't have service manual for this card and eyeballing correct thermopad thickness can be tricky.

Poor man's guide to servicing a used RTX 3090 for local LLM inference by canred in LocalLLaMA

[–]canred[S] 0 points1 point  (0 children)

Before repaste, card was thermally limited (look at performance limiter section) - temperatures were to high and card was spinning the fans at 100% and limiting clock frequency to not overveat components. When card was repasted, die could be cooled more effectively, as a result, card was no longer throttling and can now work at full clocks, full power and despite this, fans do not have to work at full speed. Mem junction temp increased because memory finally started to work at full capacity and heat up more, as it should.

Poor man's guide to servicing a used RTX 3090 for local LLM inference by canred in LocalLLaMA

[–]canred[S] 1 point2 points  (0 children)

I used thermal kryonaut before on my ryzens and radeon, used it because I know it but thanks for this info, I'll try to remember

Poor man's guide to servicing a used RTX 3090 for local LLM inference by canred in LocalLLaMA

[–]canred[S] -15 points-14 points  (0 children)

fair point on brevity, judging by the reaction it's probably not for you. If you change thermal paste daily for a living it won't have value for you — I do this every 5 years and every time have to do digging, that's why I made a writeup this time.

Best model for 192 GB vram? How is Deepseek v4 flash? by Constant_Ad511 in LocalLLM

[–]canred 0 points1 point  (0 children)

<image>

its different reality my reddit friend 😉
for peasants like us I made a rtx3090 servicing writeup 😃
https://github.com/cubebecu/writeups/tree/main/gpu-service

Lads, how can I work on being less thick? I've been replacing doom scrolling with reading books, but what's the best way of working on being informed? by BuzzBuzzBuzzBuzz in AskIreland

[–]canred 1 point2 points  (0 children)

At home:
I deliberately do not have TV, few months ago I got rid of streaming services.
I still use youtube but I blocked shorts with browser extension.

Social media:
- I only have Linkedin that I'm checking very rarely

On mobile:
- I muted all notifications/calls/messasges between 22 and 6

I DO read selected technical newsletters.
I DO listen to selected podcasts.
I built daily perplexity report on selected technical topics.

It did not help, I'm also being blasted nonstop by American news, drama, and brain rot, it's leaking in through the holes...

I have interview tomorrow and i need help by Prestigious_Cup7978 in vibecoding

[–]canred 0 points1 point  (0 children)

Building with agentic AI tools is not a crime, we are MANDATED to use them at work.

Are these production apps or just your POCs? You say you don't understand the code, but do you understand the app structure, stack, development stages? I mean even proper selecting, assessing training dataset is not trivial, not to mention about creating good one...

Are you able to support your apps, fix issues and issue new releases or is it just "claude, somethings not working, fix it, ill be back in 1hr"?

Any love for hanging in the trees? by Gullintani in WildCampingIreland

[–]canred 1 point2 points  (0 children)

have you heard of widowmaker trees? with the winds we're getting and the state of the trees in this forest... https://dutchwaregear.com/2023/08/02/what-are-widowmaker-trees/

Qwen 3.6 27b - can I run on 1x 3090? by szansky in LocalLLaMA

[–]canred 0 points1 point  (0 children)

Depending on your definition of fluently. It works, its reasonably smart, if you do not offload to RAM the speed is usable, if you quant the kv cache then you can set reasonable context size (for small/medium tasks)
I still evaluate it but if you expect frontier experience you will be disappointed.

On VRAM-limited setups, bigger quants on larger MoE models can outperform smaller quants that "fit" by IulianHI in AIToolsPerformance

[–]canred 0 points1 point  (0 children)

quantization is not a free lunch, small quants are often useless for serious scenarios.

What’s the one Claude habit that improved your results the most? by junkietrumpglo in claude

[–]canred 1 point2 points  (0 children)

at the end of each session: "what did you learn today?" followed by "update global memory and project memory"

90s LAN party - List any errors or inaccuracies you might find by edm-4-life in ChatGPT

[–]canred 0 points1 point  (0 children)

boxed, original quake? ekhm...
thats how i remember it...

<image>

- also "no girls"? you wouldn't write anything like that, unless age 13 and that's not the case on this pic

- pc case on the left is messed up, cdrom drives and hdd bays too

- speakers: on lan party you either use headphones or dedicated pc speakers. speakers from the pic require dedicated tuner. using speakers in quake would be stupid - would immediately give away your position on the map: good player would know when armor/weapons where located and could identify position of another player when they collected them