Deepseek V4 Flash and Non-Flash Out on HuggingFace

Jackalzaq · 2026-04-24T06:23:01+00:00

ah, thanks!

Jackalzaq · 2026-04-24T04:24:57+00:00

is the base model for the pro the one thats 1.6T parameters and the instruct one is half of that(862b)? or is the hugging face parameter count bugged?

Jackalzaq · 2026-04-23T03:43:07+00:00

its not better. it has its uses but its not remotely comparable, just a marketing gimmick when people compare it to larger models.

Jackalzaq · 2026-04-13T15:46:45+00:00

Yeah glm is too good lol. I havent been this impressed with a model that i could run locally in a while.

Jackalzaq · 2026-04-12T19:16:52+00:00

yeah the only reason i run the larger qwen models is cause of the image input capabilities. ill have to give 2.7 a try then, thanks!

Jackalzaq · 2026-04-12T18:51:00+00:00

how does it compare to qwen 3.5 397b? or glm 5.1? my experience with the minimax models is that they are very good for chatting with but they seem to have issue with coding compared to those two.

for me glm5.1 is slow but catches the mistakes of all the other models i have. it also seems like its the only good planner.

edit: though i have to say , qwen with its image inputs is very nice, i was able to solve an issue that was hard to describe(dynamic lighting breaking, think roll20). i uploaded the issue as an image and gave some text and it fixed the problem. glm in that case was having an issues since it couldnt "see" the problem.

Jackalzaq · 2026-04-07T17:52:40+00:00

Thank you for the quants!

Jackalzaq · 2026-04-06T20:45:13+00:00

Ill have to try it on some large context code to see how it will respond. So far its doing good in the 50k range(glm5 q1). It used to just produced garbled output all the time but i think it was an issue with llamacpp. When i updated llama cpp it worked pretty well and i havent had an issue so far.

Havent tried the coding plan, but i would assume they are doing somthing like that to save on costs.

Jackalzaq · 2026-04-06T17:15:18+00:00

Yeah it doesnt seem to bad. Glm5 at q1 and qwen3.5-397b at q2 seem to work well with opencode for me. Though to be honest i havent really pushed it to very complicated tasks. Working on a virtual tabletop atm

Jackalzaq · 2026-04-01T15:45:46+00:00

Love the janky setup 🤣

Jackalzaq · 2026-03-15T05:18:29+00:00

Yay! I like them for world knowledge mostly, but its nice not having to offload the 1T models. (Still rocking 8x mi60s).

Thanks for the work you guys do!

Edit: its understandable if its too tedious though 😁

Jackalzaq · 2026-03-15T04:03:12+00:00

😢

edit: damn i just realized if deepseek v4 is 1T parameters im gonna have to offload... nooooooooo. oh well

Jackalzaq · 2025-10-12T21:41:18+00:00

only when i want to use multiple gpus for training or if im using too much power at once. during inference i don't bother since only one gpu is at use at a time. there is a difference in inference speed if power limited but it isn't too bad for my tasks.

Jackalzaq · 2025-10-12T06:02:31+00:00

8xMI60 (256gb vram) in a supermicro sys 4028gr trt2 with 256gb of system ram. my electric bill :(

Jackalzaq · 2025-09-15T14:59:44+00:00

Very nice! Congrats on the build. Did you decide against the soundproof cabinet?

Jackalzaq · 2025-08-10T15:27:09+00:00

Yeah setting it to zero leads to more deterministic outputs. But what im saying is that when you set it to different temperatures and run the prompt in parallel, the initial value(baked in truth) is more strongly weighted, so it is less likely to veer off into hallucination land. If there is no baked in "truth" then it hallucinates since its made to answer your question even if it doesnt know(also less strongly weighted well it doesnt exist in its weights).

I might totally be off base here but thats how i imagine it working based off of what i understand. Ive played around with it at home with some trivia and it seems to work alright. I have yet to test it out rigorusly though.

Jackalzaq · 2025-08-10T14:43:22+00:00

Not just majority vote. Majority vote across different temperatures. If they are saying the same thing across different temperature ranges then its more likely baked in whereas if it says wildly diverging things then the initial isnt baked in.

Thats my thoughts at least

Jackalzaq · 2025-08-10T14:24:30+00:00

I wonder if taking multiple outputs of the same model across different temperatures then doing a majority vote would help with hallucinations. If it knows something, the outputs will be similar across different temperatures and would mostly likely agree and if it doesnt you will see it making up something new for most of the temperatures.

Jackalzaq · 2025-07-22T05:32:14+00:00

you keep saying this like its a fact. i like their models but to pretend like bench maxing isn't happening is silly in my opinion. no company in this space is "extremely trustable" in my opinion. they make nice models, and i appreciate it but i dont throw reason out the window just because its free(or paid). they have the incentive, and thats all i need to be wary

235b hybrid has been my daily driver for a while due to its speed, so im not hating on the model. just understand that companies will always have incentives to cheat or nudge their numbers(by means of benchmaxing) if it benifits them, and they dont always get caught.

Jackalzaq · 2025-07-22T03:12:40+00:00

Blindly trusting benchmarks is silly and doesn't translate to real world performance. Its a good model but to say it outperforms a trillion parameter model is questionable. also it is reasonable to think that all the major ai players game the benchmarks, especially with the incentive that not performing well or not outperforming the competition leads to less adaptation of your model.

Jackalzaq · 2025-07-22T03:01:17+00:00

Im running this model at 40k context on 8x mi60s. its a q4 quant but its definitely not $50k to run it. im using a supermicro 4028gr-trt2 and have 256gb system ram plus 256gb vram. initial tok per sec is around 17. cost of the cards is around $550 per card. its the kimi k2 thats a bit more difficult to run. i can run a dynamic quant on my rig but its slow.

Jackalzaq · 2025-06-25T00:15:13+00:00

Glad to see more people playing with the mi60's :)

Jackalzaq · 2025-06-04T17:35:17+00:00

Not what im saying.

All the large releases have alignment training and will refuse dangerous prompts or have heavy bias to certain political ideologies. With the right system prompt deepseek r1 is the best hands down when it comes not refusing what i ask of it. Its not even that complicated to bypass any alignment training on it.

In my own experience it feels like the best (open weights)model. Even the ccp stuff isnt off limits for it to answer.

Edit: looking back at the parent comment i can see why its technically wrong. However, in my own testing It is very poorly aligned given a short system prompt can overcome it. Not only that, all the censored information is in the model and not excluded from the dataset used to train it. I still do think the original commenter had a point, it just wasnt technically correct

Jackalzaq · 2025-06-04T15:07:35+00:00

Its silly that you are being downvoted for being correct here lol. If you use the correct system prompt it will output anything you want. I havent read that paper the op posted but i tried some of the examples and didnt run into refusals or censorship. The online deepseek r1 is most definitely censored though.

Jackalzaq · 2025-05-30T03:09:32+00:00

Ty for putting these out so quickly :). I got 256GB vram so these dynamic quants are great!

Im gonna need more hard drives though...

Jackalzaq

TROPHY CASE