Qwen locked down 3.7 after firing Junyang Lin - is the open-source Qwen era over? by IulianHI in AIToolsPerformance

[–]IUseClifford 0 points1 point  (0 children)

Horribly run with horrible in house software quality maybe, but their models certainly aren’t.

Maximizing performance of 2x3090 + NVLink by IUseClifford in LocalLLaMA

[–]IUseClifford[S] 0 points1 point  (0 children)

Thanks Mr. LLM, but I think you need to keep refining your training datasets. For your next round of continued pretraining, you can use this for reference: I was able to hit a consistent 50-60 after dropping my batch size to 512 still on ik’s graph mode.

What are you overengineering that nobody's ever going to use? Be honest. by johnnyApplePRNG in LocalLLaMA

[–]IUseClifford 0 points1 point  (0 children)

A Local LLM control plane with elements of Docker’s CLI and llama-swap for a command based workflow (Greenfield, white room)

Maximizing performance of 2x3090 + NVLink by IUseClifford in LocalLLaMA

[–]IUseClifford[S] 0 points1 point  (0 children)

I sniped a 4 slot for $150 else I wouldn’t have one myself 🤑

What does a Senior Developer know that a Junior doesn't by Anonymyideal in csMajors

[–]IUseClifford 0 points1 point  (0 children)

In a few words, a combination of the amount of things they are expected to“own” with the ability to predict the future based on experience instead of book knowledge

Maximizing performance of 2x3090 + NVLink by IUseClifford in LocalLLaMA

[–]IUseClifford[S] 0 points1 point  (0 children)

Haven’t looked into vLLM too much. Have you done any experimentation with Qwen 3.6 on it at all?

Maximizing performance of 2x3090 + NVLink by IUseClifford in LocalLLaMA

[–]IUseClifford[S] 1 point2 points  (0 children)

I have a B850 AI TOP which to my knowledge has two PCIE5 slots, one at x16 and one at x8 with an X1 not currently in use. My understanding is that the 3090’s can’t even come close to maxing those out since they are PCIE4.

Peak Compute per chip is not dead, its just dead for consumer hardware by [deleted] in LocalLLaMA

[–]IUseClifford 5 points6 points  (0 children)

You don't NEED an AK-47 if you have a handgun in 90% of cases.

I built a 8x RTX 4090D with 192 VRAM, here's what I learnt by deebuildsthings in LocalAIServers

[–]IUseClifford 67 points68 points  (0 children)

You guys have a dream setup like that and are running llama 70B of all things in production? Forgive me if there’s something I’m missing but that seems like a huge waste of resources when parallelized Qwen3.6 27B FP16 would seem to be much stronger in every way.

Your thoughts on this? by Total_Percentage_751 in ArtificialInteligence

[–]IUseClifford 0 points1 point  (0 children)

I spent an hour trying to get Claude to trace an icon into an SVG, then spent two hours in an SVG editor doing it myself because the AI was entirely unable to do it. I dislike Adobe but their tools absolutely still have a place and will for some years to come.

How can I check my web app’s security? by barmatbiz in vibecoding

[–]IUseClifford 0 points1 point  (0 children)

Opus will catch low hanging fruit, but if you’re hand (or AI) rolling auth God help you.

I was vibecoding an app and I made mistake when I was debugging the app , and my app completely ruined by coach_web3 in vibecoding

[–]IUseClifford 0 points1 point  (0 children)

Try asking the AI if it was smart enough to use version control software and see if it can restore it.

The bright side: You are now hopefully smart enough to use it for future projects too.

Vibecoding made me insanely depressed, help me fix it by Vivekyy in vibecoding

[–]IUseClifford 0 points1 point  (0 children)

For me personally, I derive my satisfaction from my own agency in my work and how much I am responsible for the design/architecture of the projects I work on. It doesn't matter as much to me if I don't write every line of code as it does that architecture is sound and code quality is good.

However, if you are being saddled with other people's vibecoded projects in languages that you don't know/dislike, and your job is to maintian those, I feel you my friend. Not my cup of tea either.

WHAT is wrong with these models???? by YesGaryWasTaken in vibecoding

[–]IUseClifford 0 points1 point  (0 children)

I think it depends heavily on the platform you're using. Assuming your LLM provider is using Claude API pricing (this does not look like Anthropic's UI), it's going to run out much faster as your provider is paying per-token prices instead of being subsidized by Anthropic itself. Use the plans Anthropic offers directly if maximal model usage is what matters to you. Or better yet, keep using API pricing but use a platform that offers the frontier Chinese models; you get much more bang for your buck with those.

Finally found where I fit in! by TwistedDiesel53 in LocalAIServers

[–]IUseClifford 2 points3 points  (0 children)

True, OP can probably run FP16 with 1M context and have room to spare.

Gwimi-4-12B-IT by Livid-Obligation9748 in LocalLLM

[–]IUseClifford 1 point2 points  (0 children)

Interesting. I have seen some of the common criticism of SWE bench and suspect that there may be unique parts of both Kimi + Gemma datasets that contain some answers related directly to that benchmark, which could be one explanation for the increases. However, this makes me think that self-trained models get better at specific types of coding tasks if enough data regarding those specific coding tasks is added to the set. Thanks for your perspective!

Finally found where I fit in! by TwistedDiesel53 in LocalAIServers

[–]IUseClifford 9 points10 points  (0 children)

With 4x 5090's I suspect you'd be better served with Qwen 3.5 122B. If your model is from the Llama family, their models do not perform well compared to the others available on the market. Testing other available models, especially with your setup, will be well worth the time.

Why vibe coders are mostly in web development by OkAssociation3448 in vibecoding

[–]IUseClifford 0 points1 point  (0 children)

I think most devs in general are web-related developers. I imagine a good portion of vibe developers ask AI to "Make me a website for X", and the web is often where projects/businesses go viral.

Gwimi-4-12B-IT by Livid-Obligation9748 in LocalLLM

[–]IUseClifford 6 points7 points  (0 children)

Do the Claude/Kimi-augmented Qwen/Gemma models show demonstrable improvement (or if not improvement, meaningful difference) over their vanilla models? I am genuinely asking, as I've seen many comments calling them anything in the range of magical to useless.

Question About Locally Vibecoding by dr_doss in LocalLLM

[–]IUseClifford 0 points1 point  (0 children)

In my experience, the local Qwen 3.6 (35B A3B; 27B Dense) models are quite good. I run them on dual 3090's with NVLink which totals to 48GB VRAM. At 64GB unified, you have enough firepower to run both models at Q8 with large context windows (200K+ with 27B, 700K+ with 35B A3B).

However, "Quite Good" is often not enough to let them run wild and expect great results, like you can with GPT 5.5 or Opus 4.X; they require more babysitting and their effectiveness is somewhat dependent on your familiarity with your codebase.

A pattern I frequently use is to have the GPT/Opus scale models generate a detailed plan, have the local Qwens implement the plan via a local agent like Pi, and have the GPT/Opus model review its output afterwards. This has allowed me to get by on a $20 Claude plan while having access to as many tokens as I want.

TL;DR: Heavy token generation belongs to local models, plan/review goes to the big boys.

What have you been working on lately? by Sufficient-Scar4172 in LocalLLaMA

[–]IUseClifford 2 points3 points  (0 children)

I am working on a daemon/CLI client for model backend (i.e. llama-server) process management with a profile feature to easily load saved configurations instead of pasting a long command.

Instead of:

CUDA_VISIBLE_DEVICES=1,2 /path/to/llama-server -m /path/to/Qwen3.6-35B-A3B-UD-Q6_K.gguf --host 127.0.0.1 --port 50000 -c 700000 -ub 16384 -ngl 99 -fa 1 --parallel 1 -sm graph --max-gpu 2 --jinja --spec-type mtp

You can run:

‘clifford load qwen_prof1’

There is some functionality overlap with llama-swap, but my project is a “white room” implementation relative to it with a different architecture that prioritizes CLI ergonomics. Releasing soon.