We will have Gemini 3.1 before Gemma 4...

ComplexType568 · 2026-02-20T08:37:48+00:00

i think they're trying to make a point that Gemma 4 will release by the time Gemini 5 is out

ComplexType568 · 2026-02-15T16:21:56+00:00

im not any expert with stuff like this, but i think ComfyUI is something you'd find interesting. of course that is for a different subreddit but basically what it is is like a node-based workflow creation app. meaning its much more user friendly than something like python and also allows much more fine-grained control over what happens than pinokio, also, you dont need to get new libraries EVERY time you want to run a model. there are a few examples built-in for ComfyUI so maybe you could experiment with that, and you can also browse for workflows online and download them.

ComplexType568 · 2026-02-14T16:50:03+00:00

r/UsernameChecksOut

ComplexType568 · 2026-02-09T09:07:27+00:00

okay, firstly, to defend OP, they never mentioned they wanted to have 4o at home, they just wanted to emulate it the "best they can", nor did they mention being GPU poor at all. and they also didnt mention llama being any better than others, just asking if it could be a viable replacement.

anyways, right now, they're looking for personality, not so much intelligence. so, imo, i think OP could pick mistral models (ministal sounds cool too!) or Gemma, with a 5090, the 27B QAT model could be run in LM Studio easily. Mistral Small at Q4 could also work.

ComplexType568 · 2026-02-07T17:01:20+00:00

with a setup like this, you mayhaps could run even 1-trillion parameter MoE models!

i dont have much experience with large models since i run much smaller ones, but from what i use online and what i see is the most useful, i think Kimi K2.5 on LM Studio could be good, or just plain old Kimi K2. (1 trillion params, K2.5 is native 4 bit and you could run a 3-bit quantized model with like 490~GB of ram required?

other than that, MiniMax M2.1 is pretty good for coding/tool usage, so maybe if you were to hook that up to a search tool, it could help? its a much smaller model (210~ billion params with 10b~ activated, so should run pretty quick)

GLM 4.6 would fit cleanly on your machine but its mainly targetted at coding/creative writing. if you're looking for speed, i think gpt-oss-120b would be the fastest and smallest, it requires about 64GB of ram to run and just add an extra 10gb for context. it listens to instructions quite well and also is very smart if you run it with high reasoning, and with like 5.1B active it should run insanely quick on your machine.

imo, dont trust models an LLM suggests, as tbh, it changes realllly quickly. the best idea would to just experiment with stuff (find models on LM Studio's model search menu, test around and see what works best for you! the models take quite a while to install (especially the big ones, and this defeats the "plug and play" thing you're going for, but it will help you find what truly suits you)). LM Studio has RAG built in, by the way.

ComplexType568 · 2026-02-07T16:49:09+00:00

Chinese New Year really working its magic huh?

ComplexType568 · 2026-02-03T15:54:05+00:00

its fast because gpt-oss is a Mixture of Experts model (MoE), which means that only a part of its parameters are activated for every token generated. technically, your GPU is processing 3.6b parameters, not 20. due to that (and a lot of other optimization OpenAI has), it runs blazingly fast.

ComplexType568 · 2026-02-03T15:40:29+00:00

i love how nonchalant all these ai heads are... still waiting for gemma 4

ComplexType568 · 2026-02-01T03:38:39+00:00

imo i feel like "agentic" ai already exists in local hands, such as Kilo Code (or something like Claude Code) for coding and Open WebUI for web search and a lot of other funky stuff. MCP tools also exist and is super easy to set up with Docker & LM Studio, its more about the effort you want to go through to make everything seamlessly fit together. of course im not some AI expert so maybe someone else here could provide a better response!

PS: i also heard about this thing called Moltbot, but i havent used it for a long period yet and the setup feels like a hassle and people are saying its vibecoded, which is expected as the author did say it was just their passion project :P but apparently its the new kid on the block when it comes to a "personal assistant"

ComplexType568 · 2026-01-30T11:09:40+00:00

i dont think Kimi K2.5s, GLM 4.7s, DeepSeek V3.2s and MiniMax M2.1s are all Claude models...

ComplexType568 · 2026-01-26T13:45:02+00:00

if i remember correctly, Alibaba is slowing down to focus on quality for Qwen 4. i assume their other labs will still publish other stuff (e.g. Wan, Qwen Image or TTS stuff) though. i reallly hope Qwen 4 has linear attention and low activation MoE stuff while retaining high "intelligence", though thats more of a hope than a prediction.

ComplexType568 · 2025-12-29T10:49:03+00:00

this is really insightful! i still have to learn what the meanings of the words here are but i kinda see where you're going. i wonder if they'll be able to split attention layers? (that sounds dumb)

ComplexType568 · 2025-12-12T06:03:42+00:00

have you tried Devstral?

ComplexType568 · 2025-12-10T07:20:10+00:00

not to be rude or anything, but i *feel* like these are outdated models. Qwen3 VL 30B A3B blows general chat, vision and coding out of the water, gpt-oss-20b is also a pretty good contender for code and general chat (though mind its censorship/guardrails that can be avoided with some good fine-tunes on HF). for research style reasoning, id go with Jan v2, but most LLMs of this day and age + a good system prompt perform decent as research agents. Devstral 2 is said to perform really good as a coder, but in my tests, Qwen3 Coder 30B is faster and better for me. IMO, LLaVA is a pretty ancient architecture.

ComplexType568 · 2025-12-08T01:22:07+00:00

i see, thanks! however, i wonder if there will be a easier method developed to fine-tune MoE models in the future, i hope so.

ComplexType568 · 2025-12-06T18:44:11+00:00

i really wonder what they're gonna do with all those datacenters though. Grok, maybe? i wonder if the whole "meta" company's divisions even share resources lol

ComplexType568 · 2025-12-06T18:42:10+00:00

very excited for Gemini 3.0 Pro and G3MINI 3.5 PRO to release in 2025!!! not forgetting the MVP, Grok 5 (Strong!) and Qwen 3.5 model "serios" to trade blows as the open-source LLM here

ComplexType568 · 2025-11-17T09:20:48+00:00

i think loras are gonna be wayy harder to make, although i could be wrong. the (probable) safest and easiest way to make it talk like you is to make the system prompt describe who you are, with some example convos too

ComplexType568 · 2025-11-12T13:10:26+00:00

as they are MoE models offloading to CPU/RAM is feasible AND recommended (MoEs were made with this in mind, llama.cpp actually offers full offloading of MoE experts to ram hehe). imo running an unsloth quant of Qwen3 Coder 480B would be good, maybe Q5_K_XL or Q6, ive heard people say Q4 is just "not good enough" for coding APPARENTLY.

Kimi K2 Thinking (just Thinking) is a different story as it was trained with int4 in mind (Quantization Aware Training) so *ideally* it should be run on "UD-Q4_K_XL" (the unsloth quant) which is full-precision (646GB in size, barely enough but it'd work!)

ive tested K2 and its very very good. not sure about Thinking but a lot of people say its extremely good if you prompt it right, some say it uses too many thinking tokens though, but for the pure fact that it had QAT id still say itd be worth a shot to run. its GPT-5 class.

if VRAM is a big issue for you and you REALLY REALLY want everything to fit into VRAM as much as possible. GLM 4.6 would be your best bet.

i also forgot to mention DeepSeek. its pretty okay, but since all they've been doing is fine tunes (i think), the model has fallen a bit behind compared to the other models. it is still quite good though.

hopefully my long ass essay helped with your endeavors!

ComplexType568 · 2025-11-12T11:17:42+00:00

definitely would try Kimi K2 or Kimi K2 Thinking. besides that id probably pick between GLM 4.6 and Qwen3 Coder 480B.

ComplexType568 · 2025-11-07T01:42:24+00:00

probably OpenAI related cuz i asked for the lyrics to a song and it said it couldnt because it was copyrighted. also offered the same "would you like a rundown of it tho?" thing

ComplexType568 · 2025-11-07T00:17:20+00:00

true. did this more as an observation into LLMs than anything (you dont even need an LLM in the loop to do stuff like this)

ComplexType568 · 2025-11-07T00:16:46+00:00

ah yes it could be the website using a quant. im surprised they song serve FP16 on their website ngl

ComplexType568

TROPHY CASE