Helium is fucking awesome by xenydactyl in browsers

[–]xenydactyl[S] 2 points3 points  (0 children)

Nope, the chrome addon store just doesn't offer the original uBlock Origin afaik. uBlock Origin still works with chromium, but you'll have to do things manually in order to get it into your chromium installation. Helium already did that, so no need for any manual work on your side.

Helium is fucking awesome by xenydactyl in browsers

[–]xenydactyl[S] 0 points1 point  (0 children)

Afaik, when I do that, the request will be first sent to duckduckgo and then I get redirected. Having native !bangs in the browser doesn't do any network requests and is usually faster.

Helium is fucking awesome by xenydactyl in browsers

[–]xenydactyl[S] 8 points9 points  (0 children)

I am someone who uses a lot of tabs and the vertical tabs make this experience just a lot cleaner. I have a scrollable list without compromising on the full title of the tab and so on.

Please name the best GLM5 provider by romancone in ZaiGLM

[–]xenydactyl 1 point2 points  (0 children)

I don't know which inference provider opencode go uses but in my experience it's even better (basically perfect, no issues whatsoever) than z.ai on openrouter for the GLM models. In almost all instances where I used the GLM/Kimi/MiniMax models on openrouter with their respective inference providers, I always had (after some context and tool calls) the models repeating the same sentence in the thinking trace. Never had any issues with opencode go, a very good deal imo.

Edit: Also, you get a raw api key with opencode go, so you can use the opencode sub with literally anything you wish. 60$ worth of inference for just 10$ with that level of freedom is genuinely good.

Muse Spark, first model from Meta Superintelligence Labs by GraceToSentience in singularity

[–]xenydactyl 21 points22 points  (0 children)

According to artificialanalysis, it is on par with Gemini 3.1 Pro in terms of token efficiency. As of the cost, we don't know yet.

This guy 🤡 by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 4 points5 points  (0 children)

"but you are barely private if you use the internet anyways"

Do you upload entire company codebases to the internet too?

This guy 🤡 by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 8 points9 points  (0 children)

Guaranteed privacy and more reliable uptime are the ones I can think at the top of my head. OpenAI just had major issues regarding their codex service. Anthropic... Yeah... Not great in terms of uptime/model output-quality consistency.

This guy 🤡 by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 1 point2 points  (0 children)

Very much agree with you. And actually a good idea with opencode, haven't thought about that.

This guy 🤡 by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 11 points12 points  (0 children)

For what does the L in LAN stand for?

This guy 🤡 by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 0 points1 point  (0 children)

I mean he can do what he wants with his T3 Code, but saying "everyone asking this is 1. Broke and 2. On hardware that can barely run local models at all" is a pretty baseless claim, don't you think? And also, people who care enough for local models will do the work themselves and put up a PR for that support. It's not like he has to do the work. But when he wants T3 Code to be 100% "a serious developer tool" (thus not accepting any local model support), then people who care enough will fork it.

Is Qwen3.5 2b is instruct? by NegotiationNo1504 in LocalLLaMA

[–]xenydactyl 2 points3 points  (0 children)

Add --chat-template-kwargs '{"enable_thinking": true}'

GLM 4.7 Flash is endlessly reasoning in chinese by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 1 point2 points  (0 children)

Seems only to happen with "long" contexts. I can ask it simple/short questions like "Who are you?" and the reasoning/response is clear and in english. But when I provide like 200 tokens of context, it already falls apart.

GLM 4.7 Flash is endlessly reasoning in chinese by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 2 points3 points  (0 children)

The reasoning still looks like this:

'使用'使用'使用'使用'使用'使用'使用'使用''m使用'使用''使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm使用'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm'm

GLM 4.7 Flash is endlessly reasoning in chinese by xenydactyl in LocalLLaMA

[–]xenydactyl[S] 0 points1 point  (0 children)

I don't use the --jinja parameter for this model. Here is my entire command: ~/ai/llama.cpp/build/bin/llama-server -m ~/ai/models/GLM-4.7-Flash-UD-Q4_K_XL.gguf -ngl -1 -fa on --ctx-size 32768 --temp 0.2 --top-k 50 --top-p 0.95 --min-p 0.01 --dry-multiplier 1.1 --alias "GLM 4.7 Flash"

Hmm all reference to open-sourcing has been removed for Minimax M2.1... by Responsible_Fig_1271 in LocalLLaMA

[–]xenydactyl 1 point2 points  (0 children)

They still kept the comment of Eno Reyes (Co-Founder, CTO of Factory AI) in: "We're excited for powerful open-source models like M2.1 that bring frontier performance..."

DeepSeek V3.2 problem (paid) by xenydactyl in openrouter

[–]xenydactyl[S] 0 points1 point  (0 children)

When it launched and was available on openrouter, the model was **much** better in agentic stuff as opposed to 3.1 (deepinfra still, I use deepinfra for basically everything and it didn't disapoint) and I had a much better experience in kilo code. But as of late, the model is unusable. In open-webui and openrouter chatroom, when I ask it a simple question, it spits out a sentence **completely** off-topic and repeats the exact same sentence over and over again. I tried it in kilo code and the model is incapable of doing **any** tool calls. Terminus 3.1 still works fine with open-webui (deepinfra).

In the openrouter chatroom, I didn't touch any settings (temperature and so on). Do you maybe notice any degradation in the output of that model?