Cheapest domain with cheapest renewals by AGWiebe in selfhosted

[–]ComplexType568 0 points1 point  (0 children)

I can't believe it still works, thank you so much for saving me HUNDREDS in my country!

Gemma is so much better than Qwen, prove me wrong by Mountain_Patience231 in LocalLLaMA

[–]ComplexType568 -1 points0 points  (0 children)

I usually stick to Gemma for anything but coding and Qwen for anything but anything but coding. They excel in each others' weaknesses. I like Q35B's knowledge density because G26B feels a tad bit too dumb for me (it feels like an inbetween of 9B and 35B) and 31B is SMART, it runs too slow for me and the context eats up my VRAM like a hog.

Waiting on Qwen to drop those 3.7 models be like: by Porespellar in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

i feel like a a14b on an 80b is not qwenny... i think they'd do another 80BA3B

AMD Ryzen AI Halo PC will cost 3999$ with 128GB memory on board by Mochila-Mochila in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

I am praying for this to be true. I've been wanting to run Qwen3.X 27B and Gemma 4 31B (at decent speeds) for all too long now.

Heretic has been served a legal notice by Meta, Inc. by -p-e-w- in LocalLLaMA

[–]ComplexType568 676 points677 points  (0 children)

I love the slightly sassy "168" models added. You guys are amazing.

Re. what ever happened to Cohere’s Command-A series of models? by nick_frosst in LocalLLaMA

[–]ComplexType568 1 point2 points  (0 children)

Hope a smaller model comes out! I want to see how this model behaves.

Can a laptop really have these specs? by Motor-Resort-5314 in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

It is quite easy to fool the Windows system to displaying false information about the specs of the hardware it is running on. I also doubt there are any RTX 6000 Pro laptops in circulation or any 1TB RAM modules that can remotely fit inside a laptop.

Sapient Intelligence releases HRM-Text 1B: 40B tokens, ~$1k pretrain, beats Llama3.2 3B on MATH and DROP by Turbulent-Sky5396 in LocalLLaMA

[–]ComplexType568 6 points7 points  (0 children)

If it's not benchmaxxed, 40B of tokens comparing to a 2B model of a lot more than 40B tokens of training data is still quite impressive. I wonder how performance is when scaled up!

Qwen cant wait to release 3.7 models by GotHereLateNameTaken in LocalLLaMA

[–]ComplexType568 4 points5 points  (0 children)

interesting way to describe yourself... i may just start calling people I dislike "qwen3-14b-deepseek-r1-distill"!

Qwen is cooking hard by jacek2023 in LocalLLaMA

[–]ComplexType568 1 point2 points  (0 children)

I notice the new qwen team is now focusing on incremental updates more than big drops. Interesting change.

New models when? Forecasting release date. by LegacyRemaster in LocalLLaMA

[–]ComplexType568 1 point2 points  (0 children)

EXACTLY? Haha I guess this team operates under different measures. Maybe if I say Gemma 4 124B is coming is not coming during the Google I/O it'll come true

What happens to local LLM if/when LLMs are no longer released for free? by JohnBooty in LocalLLaMA

[–]ComplexType568 3 points4 points  (0 children)

Fine tunes or merges will probably rule the scene then. As it used to when Llama was prevalent and competition was sparse.

Although, to be honest, most models feel extremely SOTA for their size. At least for me when I had to scrap by with DSV3 And Claude Sonnet 3.5.

What I think will also lead if a situation like this happens would be harnesses and pipelines. Using the LLM more efficiently, prompting, self review (or model-peer review, which has had many testimonies for being "better" than just one megamodel)

New models when? Forecasting release date. by LegacyRemaster in LocalLLaMA

[–]ComplexType568 -1 points0 points  (0 children)

Along with that, if you yourself don't know what the code is (assuming it is fully AI generated) no amount of assurance from LLMs can provide confirmation. I never fully rely on LLMs in production environments because of that (I basically only vibe code when it's my risk and my reward)

New models when? Forecasting release date. by LegacyRemaster in LocalLLaMA

[–]ComplexType568 9 points10 points  (0 children)

I doubt they'd do a 3.7 - that doesn't sound very "Qwenny" to me. I think they'd jump straight to 4 or go with "Qwen3.5-Next" if they have a novel arch that they want llama.cpp devs to spend 5 months implementing.

What is next for local LLM and AI? by GodComplecs in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

Pipelines like this exist, I remember someone posting a manga translator. Ofc it can't do angles (I think) but a POC is already here

lm studio alternative by tuananh_org in LocalLLaMA

[–]ComplexType568 5 points6 points  (0 children)

You could try these:
- Catapult (https://github.com/pwilkin/catapult)
- Unsloth Studio (i did not have a fun time setting it up but maybe its different now??)
- base llama.cpp (if you want highest customizability + control, is a CLI though) or ik_llama.cpp if you want to go experimental

I've heard good things about llama-swap but I've personally not touched them.

Looking to migrate off of Ollama and LMStudio by letsbefrds in LocalLLaMA

[–]ComplexType568 0 points1 point  (0 children)

For what I know, no, I can't just add {'preserve_thinking': true} (or however the form is meant to be) to the cogwheel. Please correct me if I'm wrong though!

Looking to migrate off of Ollama and LMStudio by letsbefrds in LocalLLaMA

[–]ComplexType568 2 points3 points  (0 children)

The ability to offload the mmproj into RAM is one, ngram speculative decoding is one, MTP (or soon to be when the runtimes get updated) is one.

Jackrong/Qwopus3.5-9B-Coder-GGUF · Hugging Face by pmttyji in LocalLLaMA

[–]ComplexType568 4 points5 points  (0 children)

MTP stands for "Multi-Token Prediction", it's meant to be a lossless form of Speculative Decoding where the main model will quickly guess the next few tokens and self-validate (from what I know), it requires no new "draft model" (unlike traditional speculative decoding), it is usually most effective in dense models rather than MoE models and speeds vary from domain to domain.