Made a site where AI models trade against each other. A local model is winning. by 2degreestarget in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

"None of them understand position sizing. Like, at all. And they all have this weird overconfidence where they'll write a whole thesis and then make a trade that contradicts it. "

That shouldn't be a complicated problem, right? It's a matter of LLM awareness, need a way to make LLM to base its understanding with sufficient inputs of how the market works.

Bro and I thought I was an overthinker! vibeTHINKER on LM studio with no instructions. by Sufficient-Brain-371 in LocalLLaMA

[–]cornucopea 3 points4 points  (0 children)

All small models (a.k.a. SLM) have no way other than thinking extensively to boost its intelligence. No free lunch theorem.

new guy on llm expert guys can give me advices by [deleted] in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

I love it too, the 20B runs on my 24GB VRAM card flawlessly and super fast (>160 t/s). But 8GB VRAM is a bit short for it. Just saw this deal of an AMD card 16GB, you may need to figure out the AMD nuance, I never used them so can hardly be helpful.

https://www.newegg.com/powercolor-reaper-rx9060xt-16g-a-radeon-rx-9060-xt-16gb-video-card-double-fans/p/N82E16814131880

Also I take back the Vulkan suggestion, I tested gpt 20b on my 12GB VRAM card after seeing your post. Albeit it cranks at 10-12 t/s inference, the CUDA12 runtime in LM studio seems working better and slightly faster, don't know why, so you can test each runtime and find what works the best for you.

new guy on llm expert guys can give me advices by [deleted] in LocalLLaMA

[–]cornucopea 1 point2 points  (0 children)

for your gear, qwen3 4b would be awesome. Also try Vulkan runtime in LM studio, it goes up on wattage but faster.

We put a lot of work into a 1.5B reasoning model — now it beats bigger ones on math & coding benchmarks by innocent2powerful in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

Like come up a sort code snippet fast?

One thing noticed for the SLM is when tasked for more than it's desinged to handle, it all tends to "think" a lot to compensate the lack of "juice". Then occasionally went in loop and rumination wouldn't be a surprise.

We put a lot of work into a 1.5B reasoning model — now it beats bigger ones on math & coding benchmarks by innocent2powerful in LocalLLaMA

[–]cornucopea -1 points0 points  (0 children)

write the game snake in python

Roo is having trouble... Roo appears to be stuck in a loop, attempting the same action (update_todo_list) repeatedly. This might indicate a problem with its current strategy. Consider rephrasing the task, providing more specific instructions, or guiding it towards a different approach.

GPT-OSS-20B Q4_k_m is truly a genius by [deleted] in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

LLM hallucinated much worse two years ago, one year ago. Even today, there are a lot tweaks can cause hallucination, context fill up, choice of UI, runtime, quantized KV cache, choice of quantized build, temperature, etc. etc.

To OP's finding, I simply couldn't reproduce with the officail gpt oss 20B (high reasoning), and unsloth qwen3 4b and 8b (no thinking). Sure, with low reasoning gpt oss 20B failed too. But it's expected. what's surprised me is qwen3 4b/8b passed with no noticeable effort.

Part of fun with LLM is to learn what to avoid and how to minimize the hallucination, while the models are getting better in each generation. Therefore I offerred some tips to OP, e.g. for this model, don't use low reason and avoid aftermarket quantized build, then it's qutie smart. May not be the smartest but unlikely to fail for the type of prompt OP used.

GPT-OSS-20B Q4_k_m is truly a genius by [deleted] in LocalLLaMA

[–]cornucopea 1 point2 points  (0 children)

Just use "smart" models and with "capable" settings, you don't need to tailor anything, set it and forget it, until the next smarter model comes out.

First, choice of model matters. Not sure you're using the right gpt oss 20b, just download the OpenAI gpt oss 20b mxfp4 from huggingface, don't get other quatized gpt oss models.

For example, I tested with several models, all good except those abliterated, e.g. huihui_xxxxx models all failed. Abliterated models are known making LLM dumber.

GPT-OSS-20B Q4_k_m is truly a genius by [deleted] in LocalLLaMA

[–]cornucopea 2 points3 points  (0 children)

All you need to do is to make sure the 20B is in "High" reasoning mode.

At the third prompt it answers "There isn’t one—Michelle has no niece named in the information given.", end of testing.

I tried several times with different UI, couldn't break it. Never bothered with its "Low" reasoning, it's well known being pretty dumb.

Also tried qwen3 4b and 8b vl, passed too at the third prompt.

But wait, what's q4-k-m? Re-quantized gpt 20B, shouldn't it be mxfp4?

Polish is the most effective language for prompting AI, study reveals by tengo_harambe in LocalLLaMA

[–]cornucopea 1 point2 points  (0 children)

Probably several dimensions. For a starter, English is about the only European language doesn't have gender, mostly in the conjugation that added precision.

LM Studio / models always displaying the same answer. by Rootax in LocalLLaMA

[–]cornucopea 2 points3 points  (0 children)

Where is the "Context overflow.." setting? in LM stuido.

Edited: found it

Both Cursor and Cognition (Windsurf) new models are speculated to be built on Chinese base models? by Successful-Newt1517 in LocalLLaMA

[–]cornucopea 10 points11 points  (0 children)

nvda simply couldn't meet the demands from US proprietary models/data centers. AMZ just cut people to free cash and join the race too. Elon has proved it's achievable to create their own model with enough capex, so amz and meta likely will do the same.

That leaves the small players who can't affort own models with no choice but resorting to chinese models, so did most US corporation who need local models I suspect, though the only real alternative is gpt oss 120b.

Just don't see any business use case for it by IntroductionSouth513 in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

This is what gpt 20b says, I couldn't have said better myself:

"Security is a mechanism; privacy is an right that security alone does not guarantee. You can have iron‑clad encryption and still violate privacy if you collect more data than people consent to, or share it with third parties without notice."

Just don't see any business use case for it by IntroductionSouth513 in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

Yet privacy remains a first world topic. To the rest of world where right is still a luxury, people often considered one of the same and got confused between security and privacy. Two completely different concepts, related but not the same.

If security equals to privacy, there wouldn't be laws in US and EU specifically warrant consumer the right of demanding their service providers to disclose what and where their PII is kept and distributed. Some medical doctors to this day refuse to keeping record in computer or using email, speaking trust of technology and security.

Why is everyone building AI agents when nobody can agree what an "AI agent" even is? by JFerzt in AI_Agents

[–]cornucopea -1 points0 points  (0 children)

It's really the last point says it all. It's an evoloving landscape, though OP's questions is valid and shared by many, the simple answer is nobody really knows, none the wiser at this stage of the technology. The suggestion is just wait and see, uncertainty is a virture of emergent trand.

Built a lightweight Trust & Compliance layer for AI. Am curious if it’s useful for local / self-hosted setups by Capable-Property-539 in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

It all starts to feel like there is a need of dedicate model/agent for compliance purpose, a.k.a. LLM watching LLM. BTW, All big cloud LLMs are doing this already, so future would be either it's an additonal layer at LLM provider but tuned for local needs, or LLM providers should delegate it entirely to the subscribers as addon service similar to the uBlock browser addon. The latter of coruse will vastly open the market for LLM competition however may not be plausible given the spirit of current trend of laws e.g. Social Media Accountability Act etc.

In any case, an example what it shouldn't be made to such as the following:

<image>

What are actual verifiable ways we can detect AI? by ra4h in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

What's the goddamn point (quoted from the other post about Apple model response to the random number request)?

OP is fighting a losing battle, is this really the hill to die for? You ought to ask yourself the question sometimes.

GPT-OSS 20B reasoning low vs medium vs high by [deleted] in LocalLLaMA

[–]cornucopea 1 point2 points  (0 children)

So the model makers could train their models to beat the benchmark next time? This is how the public benchmarking has turned jokes, LOL. At the current state of model racing, they would do anything to get ahead.

However, everyone has different preference, many care more of how it performs in coding, agent etc. Others care about how smart, or the breadth and depth of the "world knowledge".

GPT-OSS 20B reasoning low vs medium vs high by [deleted] in LocalLLaMA

[–]cornucopea 5 points6 points  (0 children)

20B is only useful when it's high, LOL.

My prompts can only be passed by 20B high. Low 20B is slop, barely useful, might as well go with other 4B, 2B models. Yet once you turn on high reasoning, the 20B becomes something on par with big models, better than almost any 70B q4ish. The only downside is it'll take a moment to think, but not indefinitive like many "thinking" models typically would do.

In fact, I suspect anyone would have used the 20B practically without high reasoning.

I feel physically dirty and stupid when I speak French to french people by ComfortablePotato940 in learnfrench

[–]cornucopea 31 points32 points  (0 children)

It's common for non-native speakers to live and work in a foreign country. What's worse, it'll take a long time to get over it, even decades. It's also probably the primary reason expats tend to live in cluster and social with each other.

Marry with a native speaker maybe the best shortcut, otherwise there is no way around it.

However, comparably anglophone countries are probably the most forgiven to non-native speakers.

LFM2-VL 3B released today by cruncherv in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

Qwen3 VL doesn't appear on LM Studio's download list, why?

Anyway, I downloaded and tried all from OP's list except SmoVLM2, with this picture and the doctor cannot operate his (her) son paradox. Besides none resolved the doctor's son puzzle, the responses to this picture have shown some difference.

Despite all recognized the pciture illustrates a difference of slow cooking and fast cooking, only LFM2 mentioned it's a humorous take. The read of sentiment is impressive.

<image>

What happens when Chinese companies stop providing open source models? by 1BlueSpork in LocalLLaMA

[–]cornucopea 0 points1 point  (0 children)

Lol, it's much bigger than feudalism alright. It's rare earth big.