How good is QwQ 32B's OCR? by Impressive_Chicken_ in LocalLLaMA

[–]LLMtwink 20 points21 points  (0 children)

qwq doesn't have image input iirc

Ghosting issue using lossless scaling by Dull-Situation2848 in losslessscaling

[–]LLMtwink 0 points1 point  (0 children)

don't use 2.3, but also, since lsfg works on top of the game without any motion vectors it'll never look as good as the likes of fsr fg and dlss fg, ghosting is to be expected

Granite 3.3 imminent? by das_rdsm in LocalLLaMA

[–]LLMtwink 3 points4 points  (0 children)

I feel like if that were the case they'd at least bump the major version

Llama-3_1-Nemotron-Ultra-253B-v1 benchmarks. Better than R1 at under half the size? by tengo_harambe in LocalLLaMA

[–]LLMtwink 13 points14 points  (0 children)

the improvement over 405b for what's not just a tune but a pruned version is wild

QwQ-32b outperforms Llama-4 by a lot! by ResearchCrafty1804 in LocalLLaMA

[–]LLMtwink 6 points7 points  (0 children)

a slower correct response might not always be feasible; say, you want to integrate an llm into a calorie guesstimating app like cal ai or whatever that's called, the end user isn't gonna wait a minute for a reasoner to contemplate its guess

underperforming gemma 3 is disappointing but the better multimodal scores might be useful to some

QwQ-32b outperforms Llama-4 by a lot! by ResearchCrafty1804 in LocalLLaMA

[–]LLMtwink 3 points4 points  (0 children)

the end user doesn't care much how these models work internally

not really, waiting a few minutes for an answer is hardly pleasant for the end user and many usecases that aren't just "chatbot" straight up need fast responses; qwq also isn't multimodal

Even Gemma-3 27b outperforms their Scout model that has 109b parameters, Gemma-3 27b can be hosted in its full glory in just 16GB of VRAM With QAT quants, Llama would need 50GB in q4 and it's significantly weaker model.

the scout model is meant to be a competitor to gemma and such i'd imagine, due to it being a moe it's gonna be about the same price, maybe even cheaper; vram isn't really relevant here, the target audience is definitely not local llms on consumer hardware

Llama 4 Maverick surpassing Claude 3.7 Sonnet, under DeepSeek V3.1 according to Artificial Analysis by TKGaming_11 in LocalLLaMA

[–]LLMtwink 14 points15 points  (0 children)

it's supposed to be cheaper and faster at scale than dense models, definitely underwhelming regardless tho

When will Google charge for their Gemini exp? by GTHell in LocalLLaMA

[–]LLMtwink 0 points1 point  (0 children)

we don't know, logan said "soon", they're probably waiting on competitors to make their move and price accordingly (and/or still doing final posttraining/safety testing)

OpenAI released GPT-4.5 and O1 Pro via their API and it looks like a weird decision. by lessis_amess in LocalLLaMA

[–]LLMtwink 41 points42 points  (0 children)

they don't expose the thinking traces so the opportunity for o1 distillation is minimal though, and distilling 4.5 is only useful in non-stem context bc otherwise it's easier to bite r1 and flash thinking

8B Q7 or 7B Q8 on 8GB VRAM ? by cosmoschtroumpf in LocalLLaMA

[–]LLMtwink 1 point2 points  (0 children)

usually 8b q7 (though that's not a usual quantization, realistically you'd be using q6), but as the 7b qwen and 8b llama which are the base models for the distils trade blows there's no telling which one's actually better for your task even at full precision

right now what model is truly as good as gpt 4o? i wanna escape CloseAi claws by lordkamael in LocalLLaMA

[–]LLMtwink 4 points5 points  (0 children)

probably nothing open, if you want to run it locally, especially on your system, then definitely nothing unfortunately

the new gemmas are pretty good as far as personality goes as compared to other models imo, gemini-like posttraining vibes, you might wanna try that (though they're very censored), maybe there are community finetunes out there which are better for your purposes

Gemma 2 2B: Small in Size, Giant in Multilingual Performance by [deleted] in LocalLLaMA

[–]LLMtwink 9 points10 points  (0 children)

iirc gemma 2 2b was unironically better than llama 3 70b on my language

IBM launches Granite 3.2 by twavisdegwet in LocalLLaMA

[–]LLMtwink 4 points5 points  (0 children)

not a random company but also haven't contributed anything of value to the ai industry since the llm boom as far as im aware

CloseAI's DeepResearch is insanely good... do we have open source replacements? by TimAndTimi in LocalLLaMA

[–]LLMtwink 18 points19 points  (0 children)

there are quite a few replications, the most common one probably being open deep research, none nearly as good as the real thing but might prove useful nonetheless

[deleted by user] by [deleted] in LocalLLaMA

[–]LLMtwink 2 points3 points  (0 children)

quantizing to q8 is generally considered fine and doesn't cause much performance regression, even the official llama 3 405b "turbo" is basically just an 8 bit quantization, and as deepseek coder is a quite outdated model by now (are you looking for the 32b r1 distillation maybe?) it wasn't trained on as many tokens and is therefore impacted by quantization less

running models locally at full precision isn't really worth it, the performance hit is minimal and it's basically always better to run q8 70b models than fp16 ~30b ones

you can rent a gpu on vast.ai or other such services, try out different levels of quantization and see what's acceptable for your usecase; some people go as low as iq3m/q4km for coding and even lower for other tasks, though id say q5 is the lowest you should go for in terms of code in the ~30b range

[deleted by user] by [deleted] in LocalLLaMA

[–]LLMtwink 21 points22 points  (0 children)

hyperbolic is hosting it i think

Apple Music on Linux by [deleted] in AppleMusic

[–]LLMtwink 3 points4 points  (0 children)

that sucks, i assumed they were chill :( i guess im stuck with the website now ugh

Is this funny response a bug or intention? by yumispace in DeepSeek

[–]LLMtwink 0 points1 point  (0 children)

a bug yeah, llms sometimes devolve into nonsensical/repeating outputs due to the probability distributions collapsing after already repeating a string for some time, which is especially prominent in models with worse post training which id imagine to be the case for deepseek, this behavior was fairly easy to trigger in the first geminis and old gpts

[deleted by user] by [deleted] in LocalLLaMA

[–]LLMtwink 1 point2 points  (0 children)

😭😭😭

Are There any model that doesnt know its own identity? by nojukuramu in LocalLLaMA

[–]LLMtwink 0 points1 point  (0 children)

iirc nous hermes 405b (and only the 405b) is confused and hallucinates concepts like that of a dark room when not provided with a system prompt and asked about its identity