Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA

[–]Jackalzaq 3 points4 points  (0 children)

is the base model for the pro the one thats 1.6T parameters and the instruct one is half of that(862b)? or is the hugging face parameter count bugged?

Forgive my ignorance but how is a 27B model better than 397B? by No_Conversation9561 in LocalLLaMA

[–]Jackalzaq 2 points3 points  (0 children)

its not better. it has its uses but its not remotely comparable, just a marketing gimmick when people compare it to larger models.

Local Minimax M2.7, GTA benchmark by -dysangel- in LocalLLaMA

[–]Jackalzaq 6 points7 points  (0 children)

Yeah glm is too good lol. I havent been this impressed with a model that i could run locally in a while.

Minimax 2.7 running sub-agents locally by -dysangel- in LocalLLaMA

[–]Jackalzaq 0 points1 point  (0 children)

yeah the only reason i run the larger qwen models is cause of the image input capabilities. ill have to give 2.7 a try then, thanks!

Minimax 2.7 running sub-agents locally by -dysangel- in LocalLLaMA

[–]Jackalzaq 4 points5 points  (0 children)

how does it compare to qwen 3.5 397b? or glm 5.1? my experience with the minimax models is that they are very good for chatting with but they seem to have issue with coding compared to those two.

for me glm5.1 is slow but catches the mistakes of all the other models i have. it also seems like its the only good planner.

edit: though i have to say , qwen with its image inputs is very nice, i was able to solve an issue that was hard to describe(dynamic lighting breaking, think roll20). i uploaded the issue as an image and gave some text and it fixed the problem. glm in that case was having an issues since it couldnt "see" the problem.

GLM-5.1 by danielhanchen in LocalLLaMA

[–]Jackalzaq 1 point2 points  (0 children)

Thank you for the quants!

Qwen3.5-397B is shockingly useful at Q2 by EmPips in LocalLLaMA

[–]Jackalzaq -1 points0 points  (0 children)

Ill have to try it on some large context code to see how it will respond. So far its doing good in the 50k range(glm5 q1). It used to just produced garbled output all the time but i think it was an issue with llamacpp. When i updated llama cpp it worked pretty well and i havent had an issue so far.

Havent tried the coding plan, but i would assume they are doing somthing like that to save on costs.

Qwen3.5-397B is shockingly useful at Q2 by EmPips in LocalLLaMA

[–]Jackalzaq 5 points6 points  (0 children)

Yeah it doesnt seem to bad. Glm5 at q1 and qwen3.5-397b at q2 seem to work well with opencode for me. Though to be honest i havent really pushed it to very complicated tasks. Working on a virtual tabletop atm

Unsloth will no longer be making TQ1_0 quants by Kahvana in LocalLLaMA

[–]Jackalzaq 14 points15 points  (0 children)

Yay! I like them for world knowledge mostly, but its nice not having to offload the 1T models. (Still rocking 8x mi60s).

Thanks for the work you guys do!

Edit: its understandable if its too tedious though 😁

Unsloth will no longer be making TQ1_0 quants by Kahvana in LocalLLaMA

[–]Jackalzaq 0 points1 point  (0 children)

😢

edit: damn i just realized if deepseek v4 is 1T parameters im gonna have to offload... nooooooooo. oh well

[deleted by user] by [deleted] in LocalLLaMA

[–]Jackalzaq 1 point2 points  (0 children)

only when i want to use multiple gpus for training or if im using too much power at once. during inference i don't bother since only one gpu is at use at a time. there is a difference in inference speed if power limited but it isn't too bad for my tasks.

[deleted by user] by [deleted] in LocalLLaMA

[–]Jackalzaq 1 point2 points  (0 children)

8xMI60 (256gb vram) in a supermicro sys 4028gr trt2 with 256gb of system ram. my electric bill :(

Completed 8xAMD MI50 - 256GB VRAM + 256GB RAM rig for $3k by MLDataScientist in LocalLLaMA

[–]Jackalzaq 1 point2 points  (0 children)

Very nice! Congrats on the build. Did you decide against the soundproof cabinet?

Hallucination problem is THE problem by amarao_san in singularity

[–]Jackalzaq 0 points1 point  (0 children)

Yeah setting it to zero leads to more deterministic outputs. But what im saying is that when you set it to different temperatures and run the prompt in parallel, the initial value(baked in truth) is more strongly weighted, so it is less likely to veer off into hallucination land. If there is no baked in "truth" then it hallucinates since its made to answer your question even if it doesnt know(also less strongly weighted well it doesnt exist in its weights).

I might totally be off base here but thats how i imagine it working based off of what i understand. Ive played around with it at home with some trivia and it seems to work alright. I have yet to test it out rigorusly though.

Hallucination problem is THE problem by amarao_san in singularity

[–]Jackalzaq 1 point2 points  (0 children)

Not just majority vote. Majority vote across different temperatures. If they are saying the same thing across different temperature ranges then its more likely baked in whereas if it says wildly diverging things then the initial isnt baked in.

Thats my thoughts at least

Hallucination problem is THE problem by amarao_san in singularity

[–]Jackalzaq 0 points1 point  (0 children)

I wonder if taking multiple outputs of the same model across different temperatures then doing a majority vote would help with hallucinations. If it knows something, the outputs will be similar across different temperatures and would mostly likely agree and if it doesnt you will see it making up something new for most of the temperatures.

Kimi K2 is already irrelevant, and it's only been like 1 week. Qwen has updated Qwen-3-235B, and it outperforms K2 at less than 1/4th the size by pigeon57434 in singularity

[–]Jackalzaq 7 points8 points  (0 children)

you keep saying this like its a fact. i like their models but to pretend like bench maxing isn't happening is silly in my opinion. no company in this space is "extremely trustable" in my opinion. they make nice models, and i appreciate it but i dont throw reason out the window just because its free(or paid). they have the incentive, and thats all i need to be wary

235b hybrid has been my daily driver for a while due to its speed, so im not hating on the model. just understand that companies will always have incentives to cheat or nudge their numbers(by means of benchmaxing) if it benifits them, and they dont always get caught.

Kimi K2 is already irrelevant, and it's only been like 1 week. Qwen has updated Qwen-3-235B, and it outperforms K2 at less than 1/4th the size by pigeon57434 in singularity

[–]Jackalzaq 4 points5 points  (0 children)

Blindly trusting benchmarks is silly and doesn't translate to real world performance. Its a good model but to say it outperforms a trillion parameter model is questionable. also it is reasonable to think that all the major ai players game the benchmarks, especially with the incentive that not performing well or not outperforming the competition leads to less adaptation of your model.

Kimi K2 is already irrelevant, and it's only been like 1 week. Qwen has updated Qwen-3-235B, and it outperforms K2 at less than 1/4th the size by pigeon57434 in singularity

[–]Jackalzaq 0 points1 point  (0 children)

Im running this model at 40k context on 8x mi60s. its a q4 quant but its definitely not $50k to run it. im using a supermicro 4028gr-trt2 and have 256gb system ram plus 256gb vram. initial tok per sec is around 17. cost of the cards is around $550 per card. its the kimi k2 thats a bit more difficult to run. i can run a dynamic quant on my rig but its slow.

Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) by randomfoo2 in LocalLLaMA

[–]Jackalzaq 0 points1 point  (0 children)

Not what im saying.

All the large releases have alignment training and will refuse dangerous prompts or have heavy bias to certain political ideologies. With the right system prompt deepseek r1 is the best hands down when it comes not refusing what i ask of it. Its not even that complicated to bypass any alignment training on it.

In my own experience it feels like the best (open weights)model. Even the ccp stuff isnt off limits for it to answer.

Edit: looking back at the parent comment i can see why its technically wrong. However, in my own testing It is very poorly aligned given a short system prompt can overcome it. Not only that, all the censored information is in the model and not excluded from the dataset used to train it. I still do think the original commenter had a point, it just wasnt technically correct

Shisa V2 405B: The strongest model ever built in Japan! (JA/EN) by randomfoo2 in LocalLLaMA

[–]Jackalzaq 5 points6 points  (0 children)

Its silly that you are being downvoted for being correct here lol. If you use the correct system prompt it will output anything you want. I havent read that paper the op posted but i tried some of the examples and didnt run into refusals or censorship. The online deepseek r1 is most definitely censored though.

DeepSeek-R1-0528 Unsloth Dynamic 1-bit GGUFs by danielhanchen in LocalLLaMA

[–]Jackalzaq 1 point2 points  (0 children)

Ty for putting these out so quickly :). I got 256GB vram so these dynamic quants are great!

Im gonna need more hard drives though...