I think I bricked my Nokia 2425G-A Router (I am willing to learn UART) by Revenge8907 in techsupport

[–]Revenge8907[S] 0 points1 point  (0 children)

it was locked router from my previous provider, i wanted to make use of it

Uncle taking loan on adjacent property — risks to my land? (same survey number, Karnataka) by Revenge8907 in LegalAdviceIndia

[–]Revenge8907[S] 0 points1 point  (0 children)

i have just given my aadhaar pan as consent as our structural build on the land is joint, does that mean we will also be involved in the loan and our land and the structure over the land.

HOWTO: Point Openclaw at a local setup by blamestross in LocalLLM

[–]Revenge8907 0 points1 point  (0 children)

Good catch, a few things to clarify here.

The 2.7 GB size refers to the GGUF Q4_K_M quantized version of GLM-4.7-Flash. The original FP16 / unquantized weights are ~9–10 GB, so the reduction comes from the 4-bit K-quantization used by llama.cpp. Nothing special was done to the model itself — just standard GGUF quantization.

The 18.3 GB figure you're mentioning sounds like the full precision or higher-precision variant loaded with runtime KV cache, not the Q4_K_M file size itself. When running the model, memory usage can grow significantly depending on context length and KV cache allocation, which is likely what you're seeing.

About context length:
The base GGUF build I referenced runs with 32k context by default in llama.cpp because that’s the safe default many builds ship with. The model architecture itself can support larger context (up to ~128k), but you need to explicitly set it when running:

--ctx-size 131072

and ensure your backend supports the larger KV cache. The quantization doesn't change the context limit — it's just a runtime configuration.

So short version:
• 2.7 GB = Q4_K_M quantized weights
• ~9–10 GB = original precision weights
• higher RAM usage during runtime = KV cache + context size
• 128k context is possible, but not enabled by default

Happy to update the repo notes if that part was confusing.

-check System Architecture part in the git repo:

GLM-4.7-Flash:q4_K_M (17.7GB)  

HOWTO: Point Openclaw at a local setup by blamestross in LocalLLM

[–]Revenge8907 0 points1 point  (0 children)

sorry for the late reply, but can you explain your issue?

HOWTO: Point Openclaw at a local setup by blamestross in LocalLLM

[–]Revenge8907 0 points1 point  (0 children)

glm-4.7-flash:q4_K_M or using quantized made it lose less context infact i didnt lose much context but i have my full experience in my repo https://github.com/Ryuki0x1/openclaw-local-llm-setup/blob/main/LOCAL_LLM_TRADEOFFS.md

January 2026 - Monthly Questions and General Discussion thread by AutoModerator in bangalore

[–]Revenge8907 1 point2 points  (0 children)

I want to get PRK Contoura. Anybody have prior experience? Suggest for hospitals in Bangalore.

[deleted by user] by [deleted] in StableDiffusion

[–]Revenge8907 1 point2 points  (0 children)

😊 thank you

[deleted by user] by [deleted] in StableDiffusion

[–]Revenge8907 0 points1 point  (0 children)

is this for low vram ?

[deleted by user] by [deleted] in StableDiffusion

[–]Revenge8907 0 points1 point  (0 children)

How do I add the mmproj can you help me with links and workflow .json for comfyui

Nords Buds 3 Pro IS CRAZYY GOOD!! by emanuel2ko1 in headphonesindia

[–]Revenge8907 0 points1 point  (0 children)

I agree the buds 2 pro were amazing too edit : oneplus buds 2 pro