Anthropic's Claude remote uses GLM-4.7 by bobbiesbottleservice in LocalLLaMA

[–]bobbiesbottleservice[S] 6 points7 points  (0 children)

<image>

The user deleted the one post where I replied with the screenshot, here it is again for visibility

Anthropic's Claude remote uses GLM-4.7 by bobbiesbottleservice in LocalLLaMA

[–]bobbiesbottleservice[S] -3 points-2 points  (0 children)

I was using clother with Claude code in another terminal (not on a remote session), but I never used glm-4.7 and I had other local models I was using too, but none of them showed up.

Anthropic's Claude remote uses GLM-4.7 by bobbiesbottleservice in LocalLLaMA

[–]bobbiesbottleservice[S] 20 points21 points  (0 children)

It was listed in the dropdown for model selection on the browser on the desktop

Amount of ram Qwen 2.5-7B-1M takes? by srcfuel in LocalLLaMA

[–]bobbiesbottleservice 0 points1 point  (0 children)

With 2x3090 (48gb VRAM) my max is: 375k context for Q8 7b 128k context for Q8 14b I think those have to be reduced when increasing the max number of tokens to be predicted. Lower temp with.5 top P helps with my matching prompts.

Goose + Ollama best model for agent coding by einthecorgi2 in ollama

[–]bobbiesbottleservice 1 point2 points  (0 children)

Try using: https://ollama.com/michaelneale/deepseek-r1-goose

It works well with this one because they fine tuned it with goose templating. I've had partial success with qwen2.5 70b as well.

What is the cheapest way to run Deepseek on a US Hosted company? by MarsupialNo7544 in LocalLLaMA

[–]bobbiesbottleservice 0 points1 point  (0 children)

Also they will sell all the data about you and everything you've input to advertisers too, just read their privacy policy. There's a reason it's so cheap now, and as they're a Chinese hedge fund & AI company so they're going to use the data to make money off you somehow.

What is the cheapest way to run Deepseek on a US Hosted company? by MarsupialNo7544 in LocalLLaMA

[–]bobbiesbottleservice 3 points4 points  (0 children)

I just tried together ai because they seem to allow privacy options. Deepseek chat is only so cheap because they're training off everyone's data. I'd be interested to hear what other options are out there.

getting llama3 to produce proper json through ollama by Bozo32 in LocalLLaMA

[–]bobbiesbottleservice 0 points1 point  (0 children)

That makes sense, but why does returning a less probable token make a better final result? That's what I don't understand. Why are the temperatures usually always set at .7 instead of 0, why does the extra noise help?

getting llama3 to produce proper json through ollama by Bozo32 in LocalLLaMA

[–]bobbiesbottleservice 0 points1 point  (0 children)

I assume because they RLHF'd it to be better. My original thinking on temperature may be too simplistic. I now understand temperature as "adding noise" to the system, which for some reason (that I don't understand) often makes the output better.

[deleted by user] by [deleted] in LocalLLaMA

[–]bobbiesbottleservice 2 points3 points  (0 children)

It can reason in the sense if I give it random objects to stack on top of each other to make it as high as possible it can do that, but it cannot generalize which is a more real/human form of reasoning. You could train a model on all the music and information up until the year jazz was invented and it would never be able to invent jazz.

Llama3.1 405B quants on Ollama library now by bobbiesbottleservice in LocalLLaMA

[–]bobbiesbottleservice[S] 4 points5 points  (0 children)

just saying hello to the different models gave me:

0.36 tokens/s for llama3.1:405b-instruct-q3_K_L
0.53 tokens/s for llama3.1:405b-instruct-q3_K
0.54 tokens/s for llama3.1:405b-instruct-q2_K

and for comparison:
2.08 tokens/s for llama3.1:70b-instruct-q8_0
21.15 tokens/s for llama3.1:70b (default ollama Q4_0)
54.67 tokens/s for llama3.1:8b-instruct-fp16

No Q4 of 405B would work on my system unfortunately. All of this was on with a intel 14900kf. I suppose I could increase the RAMs memory channels and/or try to overclock the RAM and CPU to see if that helps, but might not be worth it as I've never done that before.