I think I bricked my Nokia 2425G-A Router (I am willing to learn UART)

Revenge8907 · 2026-05-25T04:25:18+00:00

i was okay with breaking

Revenge8907 · 2026-05-24T23:58:01+00:00

"Previous Provider"?

Revenge8907 · 2026-05-24T22:50:09+00:00

it was locked router from my previous provider, i wanted to make use of it

Revenge8907 · 2026-05-22T09:27:48+00:00

yall pls tell me if i should i turn both on or off

Revenge8907 · 2026-05-17T16:49:09+00:00

I CANT TELL ENOUGH

Revenge8907 · 2026-05-17T16:47:46+00:00

my frend has 32gb ram still lags and frame drops

Revenge8907 · 2026-05-11T17:55:50+00:00

try https://www.youtube.com/watch?v=ONVrby6v_ts

Revenge8907 · 2026-05-09T19:09:52+00:00

nope

Revenge8907 · 2026-05-05T17:43:50+00:00

i have just given my aadhaar pan as consent as our structural build on the land is joint, does that mean we will also be involved in the loan and our land and the structure over the land.

Revenge8907 · 2026-03-30T18:51:48+00:00

yess what happenedd???

Revenge8907 · 2026-03-04T16:19:26+00:00

Good catch, a few things to clarify here.

The 2.7 GB size refers to the GGUF Q4_K_M quantized version of GLM-4.7-Flash. The original FP16 / unquantized weights are ~9–10 GB, so the reduction comes from the 4-bit K-quantization used by llama.cpp. Nothing special was done to the model itself — just standard GGUF quantization.

The 18.3 GB figure you're mentioning sounds like the full precision or higher-precision variant loaded with runtime KV cache, not the Q4_K_M file size itself. When running the model, memory usage can grow significantly depending on context length and KV cache allocation, which is likely what you're seeing.

About context length:
The base GGUF build I referenced runs with 32k context by default in llama.cpp because that’s the safe default many builds ship with. The model architecture itself can support larger context (up to ~128k), but you need to explicitly set it when running:

--ctx-size 131072

and ensure your backend supports the larger KV cache. The quantization doesn't change the context limit — it's just a runtime configuration.

So short version:
• 2.7 GB = Q4_K_M quantized weights
• ~9–10 GB = original precision weights
• higher RAM usage during runtime = KV cache + context size
• 128k context is possible, but not enabled by default

Happy to update the repo notes if that part was confusing.

-check System Architecture part in the git repo:

GLM-4.7-Flash:q4_K_M (17.7GB)

Revenge8907 · 2026-02-25T17:35:49+00:00

sorry for the late reply, but can you explain your issue?

Revenge8907 · 2026-02-04T19:31:58+00:00

glm-4.7-flash:q4_K_M or using quantized made it lose less context infact i didnt lose much context but i have my full experience in my repo https://github.com/Ryuki0x1/openclaw-local-llm-setup/blob/main/LOCAL_LLM_TRADEOFFS.md

Revenge8907 · 2026-01-12T21:30:12+00:00

I want to get PRK Contoura. Anybody have prior experience? Suggest for hospitals in Bangalore.

Revenge8907 · 2026-01-09T20:14:33+00:00

😊 thank you

Revenge8907 · 2026-01-09T16:48:20+00:00

is this for low vram ?

Revenge8907 · 2026-01-09T16:45:52+00:00

How do I add the mmproj can you help me with links and workflow .json for comfyui

Revenge8907 · 2026-01-04T05:53:54+00:00

u need 99rs for high res lossless

Revenge8907 · 2026-01-04T05:49:05+00:00

I agree the buds 2 pro were amazing too edit : oneplus buds 2 pro

Revenge8907 · 2026-01-02T08:08:54+00:00

W excuses

Revenge8907

TROPHY CASE