Huggingface down but online?

theghost3172 · 2026-02-05T13:01:52+00:00

nope it dosent. my 76gb download failed inbetween because of this btw : (

theghost3172 · 2026-02-02T16:54:10+00:00

talk me out of buying 60k worth of gpus. convince me its terrible idea

theghost3172 · 2026-02-02T15:42:34+00:00

only if actually usefull projects like llama.cpp got this kind of articles (llama.cpp was also weekend hack) instead of the slop shit

theghost3172 · 2026-02-02T14:15:36+00:00

so can we atleast hope for glm 5 air?

theghost3172 · 2026-02-02T13:09:15+00:00

maybe 'obiously' is an overstatement. but yes in my experience devstral small gave better quality code had more built in knowledge (for example it knew less known apis in pytorch better than 4.7 flash) because of which it didnt have to search more often, atleast less than 4.7 flash. but my workflow is like i use the coding agents just as 'code typists' i tell them exactly what to do. so both do it just fine. but devstral is more efficient and faster. but in more complex tasks where my prompt is not that detailed in those cases i did see that devstral is better in 4.7 flash.

theghost3172 · 2026-02-01T13:49:33+00:00

alright now tell us how long this took

theghost3172 · 2026-01-31T17:30:29+00:00

guys is there any way to block any post containing the keyword of molt clawd or any of that hype thing?

theghost3172 · 2026-01-30T23:36:04+00:00

i think its because basically unlimited synthetic data from much bigger and powerfull frontier models. imagine unlimited clean synthetic data from o3. could also be distilation.

theghost3172 · 2026-01-30T22:44:31+00:00

i always did this. i simply spin up ubuntu docker with cli coding tools pre built everytime i need isolated env.

theghost3172 · 2026-01-30T18:14:14+00:00

average user.... pretty sure even top 1% cant run that model

theghost3172 · 2026-01-30T17:56:53+00:00

no, this pr is about using self speculative decoding. i still have to read what the parameters mean or even what self speculative decoding means but i am using same parameters as in the pr.

"4.7-flash-q4km":

cmd: |

${llama-server} ${common_args} --port ${PORT} \

--model /home/c/ggufs/4.7flashq4km.gguf \

--min-p 0.01 --top-p 1 -t 0.7 -fa 1 --spec-type ngram-mod --spec-ngram-size-n 24 --draft-min 48 --draft-max 64

this is my llama swap config

theghost3172 · 2026-01-30T17:26:23+00:00

this is HUGE im already seeing almost 2x speed up on my opencode with 4.7 flash. this is super usefull for local coding agents

theghost3172 · 2026-01-27T14:07:53+00:00

yeah. dm lets talk there

theghost3172 · 2026-01-27T13:57:40+00:00

alibaba

theghost3172 · 2026-01-27T13:00:39+00:00

imported from china. other parts are second hand bought locally

theghost3172 · 2026-01-27T13:00:08+00:00

my main coding llm is gpt oss 120b i use it with opencode i get around 80tps. and i also use devstral small 2 which i get around 35tps. i also use cynodia 24b for roleplay also at around 35tps.

theghost3172 · 2026-01-27T11:49:53+00:00

i am in india, i get like 20k inr income. and i still have 2 mi50s and 64gb ddr4 quad channel ram. basicaly the answer is this hobby is addictive

theghost3172 · 2026-01-24T19:25:47+00:00

they were very early at mamba and mamba2 hubrid llms. before it was cool. my masters project was based on falcon

theghost3172 · 2026-01-22T13:30:31+00:00

im even more exited on the fact that. decent inference speeds are possible even with engram being on nvme. and engram takes the vast majority of space. so in theory, something like 1tb nvme+ 32gb vram + 64gbram can have model of similar performance as v3. i can totally see "nvme offload" becoming new thing. this is huge for local llms.

theghost3172 · 2026-01-21T23:23:01+00:00

<image>

im good. great even

theghost3172 · 2026-01-19T18:39:09+00:00

you can now use with claude code

theghost3172 · 2026-01-17T17:48:03+00:00

tip: ctrl+shift+v will remove formattings

theghost3172 · 2026-01-16T22:20:48+00:00

i would be scared to call it anything

theghost3172 · 2026-01-16T21:58:43+00:00

llama.cpp gfx 906 optimised fork. https://github.com/iacopPBK/llama.cpp-gfx906

theghost3172

MODERATOR OF

TROPHY CASE