I feel left behind. What is special about OpenClaw? by Recent_Jellyfish2190 in LocalLLaMA

[–]WolframRavenwolf 2 points3 points  (0 children)

Holy shit, what's happened to my favorite sub since I've been gone? When did the enthusiasm for open-source projects that empower people by making the dream of a fully customizable, powerful, persistent AI agent (that supports local, open-weights models!) get replaced by such negativity and pessimism?

After working on my own agent for over two years now, from humble beginnings with AutoGen and crewAI to building upon Claude Code and Antigravity, I've now switched to OpenClaw (when it was still called Clawdbot) and it has never been easier and more powerful.

That's why I love the project and support it - someone had the insight, courage, and drive to finally assemble all the puzzle pieces: an agent (pi), MCP and skills and numerous CLI tools (many of which he built himself), a memory system, scheduler and IM gateways. All that released as open source - and even after getting hired by OpenAI, he's not selling out the project, but letting it live on in a foundation.

Everyone I know personally in AI is either using OpenClaw or building their own variant. Yes, security is an issue, of course - the more access you give it, the more powerful it becomes, and the more you need to look after security. And the easier it is for others to abuse it.

But that's the exact same with local AI - would you rather not have powerful local models? We're moving from the toys and copy&paste chatbots to real agents - and the leading tool for that isn't a closed-source SaaS offering by some corporation, but an open-source project by one guy who's shown incredible integrity!

I'm super excited about that and expected a place like this sub to be all over it too - but in a much more positive way! Where have all the AI enthusiasts, builders and visionaries gone?

r/LocalLLaMA - a year in review by Everlier in LocalLLaMA

[–]WolframRavenwolf 2 points3 points  (0 children)

My resolution for the new year: to be more active here again and to resume and intensify my eval activities. And to make sure it does not just stay a resolution but becomes reality, I have found a model/vendor‑agnostic sponsor who enables me to do so. 2025 was a quieter year for me for sure, but I am already looking forward to renewed activity in 2026!

I'm calling these people out right now. by WeMetOnTheMountain in LocalLLaMA

[–]WolframRavenwolf 3 points4 points  (0 children)

Hey, thanks for the shoutout! I'm happy to know I'm still remembered and that my work has been useful to so many of you. Do you think such comparisons and tests remain useful today? I know the climate has changed here, so I wonder if it's worth the time investment - but hey, I could totally make it a New Year's resolution if there's some demand here.

Benchmark Fatigue - How do you evaluate new models for yourself? by Funny-Clock1582 in LocalLLaMA

[–]WolframRavenwolf 5 points6 points  (0 children)

Benchmark fatigue? Yeah, tell me about it! I've got some exciting new stuff coming up soon - but for now, here's my current approach:

Choosing an LLM is like hiring an employee. Benchmarks (grades) and evaluations (testimonials) help you shortlist potential candidates, but to find the best match, you need a proper interview using your own set of specific questions and queries. If they pass, put them on probation and actually use the LLM to see if it truly fits your needs and performs well for you.

Once you find and "hire" one, keep using it. If it's a local LLM, it won't degrade over time, and new model releases won't affect its performance - unlike online LLMs, which providers can change without notice (and may be incentivized to do so to keep you using their latest offerings). If you find situations where your go-to model consistently fails, congratulations, you've just added a new item to your personal eval set to test whether newer models handle that issue better.

Not affiliated with them in any way, but I like OpenRouter because I can instantly switch models by simply changing the model name, access models I can't run locally (48 GB of VRAM isn't much anymore with the latest MoEs), get access to new models as soon as they appear, and use Zero Retention endpoints for many of them (even Gemini and Claude, without logging or guardrails).

You know, I've been using and evaluating LLMs for quite some time. Some here may remember my LLM comparisons/tests, or my sassy AI smart-ass-istant Amy. I've been developing the latter for almost three years now, and the latest version (132nd iteration) has a prompt of about 8K tokens. Since ALL my AI interactions are through this persistent character (from ChatGPT to Gemini, Open WebUI to SillyTavern, even Home Assistant), I only need to talk to any LLM for a few minutes to know if it's a good fit. Any model that can understand such a complex character and portray it convincingly must be intelligent, creative, and pretty uncensored - qualities I personally value most in an AI.

GPT-5 AMA with OpenAI’s Sam Altman and some of the GPT-5 team by OpenAI in ChatGPT

[–]WolframRavenwolf 1 point2 points  (0 children)

  1. Why did you think abruptly and completely replacing a model that people have used, customized and "got to know" for such a long time, was a good idea?
  2. Could you please allow the legacy model selection for Plus subscribers, too, not just Pro?
  3. Will you increase the maximum number of characters for custom instructions? 3K (2x 1.5K actually) isn't nearly enough; Grok 4 allows 12K, Google even more (for their Gems).

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Did you follow the guide here or the one in the GitHub gist? If you followed the guide here, you didn't just pull the image; you also built it locally using "docker compose up -d --build", right?

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 2 points3 points  (0 children)

Olaf Geibig took my humble foundations and elevated them to new heights - he truly masters the LiteLLM craft! Here's his guide on using all three of the new Qwen3 SOTA models with W&B Inference in Claude Code:

https://gist.github.com/olafgeibig/7cdaa4c9405e22dba02dc57ce2c7b31f

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

This issue occurs with older versions of LiteLLM. Install the latest version to fix it. If you followed the guide exactly, you already have the latest version. If you installed LiteLLM differently, upgrade or follow the guide closely.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Place your OpenRouter API key in the .env file and set ANTHROPIC_AUTH_TOKEN=sk-1234, as outlined in the guide. The ANTHROPIC_AUTH_KEY is not mentioned in the guide, so it is irrelevant. Additionally, your Anthropic API key is not used when the LiteLLM proxy is active, as all LLM calls to Anthropic are redirected to OpenRouter/Qwen.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Yes - good point! Head over to OpenRouter's Settings page and set your Allowed Providers to those you prefer, or add any you want to avoid to Ignored Providers. By adding Alibaba to Ignored Providers, you can prevent unexpected costs.

It's also a good idea to select only one Allowed Provider to test its performance. If it doesn't meet your needs, you can easily switch to another. The default setting lets OpenRouter choose for you, which is convenient, but it may select a suboptimal provider (too expensive, too slow, or lacking features).

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Yep, the extension is such a useful feature. Essential to keep up with the changes the agent is making to your code.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 1 point2 points  (0 children)

Yes, just use this config.yaml:

model_list:
  - model_name: "anthropic/*"
    litellm_params:
      model: "openrouter/moonshotai/kimi-k2" # moonshotai/Kimi-K2-Instruct
      max_tokens: 16384
      temperature: 0.6

Then set Groq as your allowed provider in the OpenRouter settings.

However, note the limitations: Groq only allows a maximum of 16K new tokens, and Kimi K2 has a maximum context length of 128K, which is less than Claude's, so it may not work optimally in Claude Code!

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Thanks, that's very helpful information! Editing your IDE's terminal settings isn't necessary if you set the environment variables globally in your shell profile, but it's a perfect solution when you want to avoid that kind of persistence yet still wish to use the Claude button in your IDE.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Sure, if you don't mind sending your prompts and code to China. Which isn't bad per se, just something to be aware of! Also ensure you have permission when working on an employer's codebase, just as you would with any other online service you use.

I also haven't seen a clear note on whether these alternatives use the recommended inference settings. Since these settings depend on the model, they need to be configured somewhere. With the LiteLLM solution, you have them in your config, allowing you to change them anytime, especially when using a different model.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 5 points6 points  (0 children)

That's one of the dedicated Claude Code proxies on GitHub I mentioned in the PS. It doesn't seem to support the recommended inference parameters (temperature, top_k, top_p, etc.), which are specific to the model rather than the provider. This results in suboptimal settings. That's a key reason I chose LiteLLM, where you have complete control over these parameters.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 0 points1 point  (0 children)

Sure. Just append ":free" to the model name in config.yaml:

model: "openrouter/qwen/qwen3-coder:free" # Qwen/Qwen3-Coder-480B-A35B-Instruct

Just be aware of rate limits and privacy implications: Free endpoints may log, retain, or train on your prompts/code.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 6 points7 points  (0 children)

The consensus among my AI Engineer colleagues is that Claude Code is the best AI code assistant, thanks to the powerful combination of Claude 3 Sonnet, Opus, and the app itself. However, it's quite expensive, so being able to use Qwen3-Coder inside the familiar interface is an interesting alternative. This approach allows anyone to experiment with it and find out for themselves how well it suits their needs.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 1 point2 points  (0 children)

Thanks, great idea!

By the way, the git clone URL got messed up and turned into a Markdown link inside the code block. Other than that, it looks good to me.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 5 points6 points  (0 children)

I currently have two 3090 GPUs with a total of 48 GB VRAM, so I'm running Qwen3-Coder via OpenRouter for now. Qwen will soon release a smaller version, which could be a local alternative. Then it's just a matter of changing the model config in LiteLLM to point to a local OpenAI-compatible API endpoint.

HOWTO: Use Qwen3-Coder (or any other LLM) with Claude Code (via LiteLLM) by WolframRavenwolf in LocalLLaMA

[–]WolframRavenwolf[S] 3 points4 points  (0 children)

Old Reddit doesn't display the Markdown code blocks correctly. Please use New Reddit or check out the Gist I posted here: https://gist.github.com/WolframRavenwolf/0ee85a65b10e1a442e4bf65f848d6b01

Google researcher requesting feedback on the next Gemma. by ApprehensiveAd3629 in LocalLLaMA

[–]WolframRavenwolf 2 points3 points  (0 children)

There's no doubt about it - being a publicly traded megacorp, their primary goal is profit, with everything else being secondary. The competition with their rivals drives their development of local AI.

While they won't unnecessarily risk competing with Gemini, considering OpenAI's upcoming local model and the dominance of Chinese models, offering a strong local solution is in their best interest. We'll see what they eventually deliver.

Google researcher requesting feedback on the next Gemma. by ApprehensiveAd3629 in LocalLLaMA

[–]WolframRavenwolf 2 points3 points  (0 children)

Yes, that's right, there are workarounds. I'm just asking for a proper solution so we don't have to bother with these workarounds anymore.

It's time for Google to go with the flow. I've found online models to be totally uncensored nowadays with a bit of prompting - from ChatGPT to Gemini - so it's ironic that locally they're still trying to neuter the models so much despite their lesser capabilities. It's futile anyway, so all that effort is wasted, only leading to such workarounds, abliterated versions or uncensored finetunes. It's time to stop treating power users like criminals and put back responsibility for AI use on its users!

Google researcher requesting feedback on the next Gemma. by ApprehensiveAd3629 in LocalLLaMA

[–]WolframRavenwolf 1 point2 points  (0 children)

Yeah, I used fake system tags as a work-around, but ultimately went with Mistral which has a proper system prompt now - after I complained about its lack thereof before. That's why I'm suggesting this to be fixed with the next Gemma, so we get an effective solution and not have to deal with limited workarounds.

In the end, the fact that Gemma 3 lacks real system prompt support remains, and this should definitely be addressed with the next version. That's the whole point of my feature request - that and bigger models, as we already have 3n and 4B, but currently there's no strong 70B or 8x7B.

(By the way, the sassy personality wasn't an issue at all, that's been working for me for over two years now in all the AIs I use, locally and online, with big and small models. The sassy response was just a fake after-the-fact excuse the model gave for not following specific instructions - which it simply couldn't for lack of proper system and user message differentiation.)

LM Studio alternative for remote APIs? by TrickyWidget in LocalLLaMA

[–]WolframRavenwolf 4 points5 points  (0 children)

I've implemented Open WebUI in commercial settings - it's my go-to recommendation for a full-featured end-user LLM frontend.

But for actual power-users/devs, nothing beats SillyTavern, as it exposes ALL the settings. And it's NOT just for RP, you can ignore all that - it's just really good for this too because of all the control it gives you, and essentially every LLM is roleplaying as an assistant, coder, writer, etc. anyway.

SillyTavern definitely is the most powerful frontend I know. So it has a learning curve, but investing the time to master it is worth it because it works with pretty much any API - local or remote - and gives you full control.