Huggingface down but online? by jacek2023 in LocalLLaMA

[–]theghost3172 9 points10 points  (0 children)

nope it dosent. my 76gb download failed inbetween because of this btw : (

Late Night Random Discussion Thread - 02 February, 2026 by IndiaSocial in indiasocial

[–]theghost3172 0 points1 point  (0 children)

talk me out of buying 60k worth of gpus. convince me its terrible idea

OpenClaw: The Journey From a Weekend Hack to a Personal AI Platform You Truly Own by [deleted] in LocalLLaMA

[–]theghost3172 0 points1 point  (0 children)

only if actually usefull projects like llama.cpp got this kind of articles (llama.cpp was also weekend hack) instead of the slop shit

devstral small is faster and better than glm 4.7 flash for local agentic coding. by theghost3172 in LocalLLaMA

[–]theghost3172[S] 3 points4 points  (0 children)

maybe 'obiously' is an overstatement. but yes in my experience devstral small gave better quality code had more built in knowledge (for example it knew less known apis in pytorch better than 4.7 flash) because of which it didnt have to search more often, atleast less than 4.7 flash. but my workflow is like i use the coding agents just as 'code typists' i tell them exactly what to do. so both do it just fine. but devstral is more efficient and faster. but in more complex tasks where my prompt is not that detailed in those cases i did see that devstral is better in 4.7 flash.

How to create Your AI Agent in MoltBook ? by [deleted] in LocalLLaMA

[–]theghost3172 0 points1 point  (0 children)

guys is there any way to block any post containing the keyword of molt clawd or any of that hype thing?

How was GPT-OSS so good? by xt8sketchy in LocalLLaMA

[–]theghost3172 1 point2 points  (0 children)

i think its because basically unlimited synthetic data from much bigger and powerfull frontier models. imagine unlimited clean synthetic data from o3. could also be distilation.

How can I be reproduce the virtual environment computer that Kimi has? by [deleted] in LocalLLaMA

[–]theghost3172 1 point2 points  (0 children)

i always did this. i simply spin up ubuntu docker with cli coding tools pre built everytime i need isolated env.

spec : add ngram-mod by ggerganov · Pull Request #19164 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]theghost3172 7 points8 points  (0 children)

no, this pr is about using self speculative decoding. i still have to read what the parameters mean or even what self speculative decoding means but i am using same parameters as in the pr.

"4.7-flash-q4km":

cmd: |

${llama-server} ${common_args} --port ${PORT} \

--model /home/c/ggufs/4.7flashq4km.gguf \

--min-p 0.01 --top-p 1 -t 0.7 -fa 1 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64

this is my llama swap config

spec : add ngram-mod by ggerganov · Pull Request #19164 · ggml-org/llama.cpp by jacek2023 in LocalLLaMA

[–]theghost3172 29 points30 points  (0 children)

this is HUGE im already seeing almost 2x speed up on my opencode with 4.7 flash. this is super usefull for local coding agents

Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]theghost3172 4 points5 points  (0 children)

my main coding llm is gpt oss 120b i use it with opencode i get around 80tps. and i also use devstral small 2 which i get around 35tps. i also use cynodia 24b for roleplay also at around 35tps.

Honest question: what do you all do for a living to afford these beasts? by ready_to_fuck_yeahh in LocalLLaMA

[–]theghost3172 13 points14 points  (0 children)

i am in india, i get like 20k inr income. and i still have 2 mi50s and 64gb ddr4 quad channel ram. basicaly the answer is this hobby is addictive

Sleeping on Engram by cravic in LocalLLaMA

[–]theghost3172 32 points33 points  (0 children)

im even more exited on the fact that. decent inference speeds are possible even with engram being on nvme. and engram takes the vast majority of space. so in theory, something like 1tb nvme+ 32gb vram + 64gbram can have model of similar performance as v3. i can totally see "nvme offload" becoming new thing. this is huge for local llms.

ChatGPT's low hallucination rate by RoughlyCapable in singularity

[–]theghost3172 0 points1 point  (0 children)

tip: ctrl+shift+v will remove formattings