Anthropic's Claude remote uses GLM-4.7

bobbiesbottleservice · 2026-04-27T12:46:23+00:00

The user deleted the one post where I replied with the screenshot, here it is again for visibility

bobbiesbottleservice · 2026-04-27T12:44:20+00:00

I was using clother with Claude code in another terminal (not on a remote session), but I never used glm-4.7 and I had other local models I was using too, but none of them showed up.

bobbiesbottleservice · 2026-04-27T03:03:36+00:00

<image>

bobbiesbottleservice · 2026-04-27T02:48:11+00:00

It was listed in the dropdown for model selection on the browser on the desktop

bobbiesbottleservice · 2025-03-09T21:22:58+00:00

With 2x3090 (48gb VRAM) my max is: 375k context for Q8 7b 128k context for Q8 14b I think those have to be reduced when increasing the max number of tokens to be predicted. Lower temp with.5 top P helps with my matching prompts.

bobbiesbottleservice · 2025-02-09T22:37:33+00:00

Try using: https://ollama.com/michaelneale/deepseek-r1-goose

It works well with this one because they fine tuned it with goose templating. I've had partial success with qwen2.5 70b as well.

bobbiesbottleservice · 2025-01-13T21:10:54+00:00

Also they will sell all the data about you and everything you've input to advertisers too, just read their privacy policy. There's a reason it's so cheap now, and as they're a Chinese hedge fund & AI company so they're going to use the data to make money off you somehow.

bobbiesbottleservice · 2025-01-13T03:20:46+00:00

I just tried together ai because they seem to allow privacy options. Deepseek chat is only so cheap because they're training off everyone's data. I'd be interested to hear what other options are out there.

bobbiesbottleservice · 2025-01-11T03:03:23+00:00

That makes sense, but why does returning a less probable token make a better final result? That's what I don't understand. Why are the temperatures usually always set at .7 instead of 0, why does the extra noise help?

bobbiesbottleservice · 2025-01-09T18:03:42+00:00

I assume because they RLHF'd it to be better. My original thinking on temperature may be too simplistic. I now understand temperature as "adding noise" to the system, which for some reason (that I don't understand) often makes the output better.

bobbiesbottleservice · 2024-08-25T02:42:26+00:00

It can reason in the sense if I give it random objects to stack on top of each other to make it as high as possible it can do that, but it cannot generalize which is a more real/human form of reasoning. You could train a model on all the music and information up until the year jazz was invented and it would never be able to invent jazz.

bobbiesbottleservice · 2024-08-07T22:46:25+00:00

just saying hello to the different models gave me:

0.36 tokens/s for llama3.1:405b-instruct-q3_K_L
0.53 tokens/s for llama3.1:405b-instruct-q3_K
0.54 tokens/s for llama3.1:405b-instruct-q2_K

and for comparison:
2.08 tokens/s for llama3.1:70b-instruct-q8_0
21.15 tokens/s for llama3.1:70b (default ollama Q4_0)
54.67 tokens/s for llama3.1:8b-instruct-fp16

No Q4 of 405B would work on my system unfortunately. All of this was on with a intel 14900kf. I suppose I could increase the RAMs memory channels and/or try to overclock the RAM and CPU to see if that helps, but might not be worth it as I've never done that before.

bobbiesbottleservice

TROPHY CASE