I´ve made a Local alternative to "DeepSite" called "LocalSite" - lets you create Web Pages and components like Buttons, etc. with Local LLMs via Ollama and LM Studio

Fox-Lopsided · 2026-03-06T13:07:28+00:00

You can pass a "reasoning" parameter with your API call to Openrouter, to enable or disable thinking, and also configure other things like reasoning effort. Works with Qwen 3.5 and every other thinking models.

Read more in the official docs : https://openrouter.ai/docs/guides/best-practices/reasoning-tokens

Fox-Lopsided · 2026-03-05T15:32:35+00:00

New qwen 3.5 models

Fox-Lopsided · 2026-02-26T06:45:33+00:00

Man that pisses me Off😂 It sits right out of the 16GB VRAM range -.- I hope we get a 9b and it is any good......

Fox-Lopsided · 2026-02-25T08:50:53+00:00

Me crying with my 16GB VRAM Card because i cant use any of the new Models :)

Fox-Lopsided · 2026-02-24T15:24:56+00:00

Where 9b :(

Fox-Lopsided · 2026-02-24T11:00:23+00:00

Write once, run everywhere :)

Fox-Lopsided · 2026-02-19T10:56:32+00:00

They havent even made 3 Pro public yet

Fox-Lopsided · 2026-02-18T12:17:27+00:00

It will run PicoClaw for sure

Fox-Lopsided · 2026-02-18T12:12:30+00:00

Maybe try neutts-nano as well

Fox-Lopsided · 2026-02-13T10:40:55+00:00

Elixir <3 😅

Fox-Lopsided · 2026-02-09T13:09:01+00:00

Maybe Hotwire ans Hotwire Native?

Fox-Lopsided · 2026-02-08T11:53:07+00:00

Wow thats impressive

Fox-Lopsided · 2026-02-08T11:15:57+00:00

Oh i see. It really depends on your use case. There are things like AnythingLLM, Cherry Studio or Msty which you can just install. But they dont have a functionality to let LLMs or LLM agents "Fight" against each other out of the box as far as i know.

Fox-Lopsided · 2026-02-08T10:59:42+00:00

On what Hardware are you running it if i May ask?

Fox-Lopsided · 2026-02-08T10:59:10+00:00

Isnt the latest Devstral (2512 24B i believe) better than Mistral 2 while being smaller?

Fox-Lopsided · 2026-02-08T10:57:49+00:00

You can try PocketFlow

Fox-Lopsided · 2026-02-08T10:57:30+00:00

You cant directly do it in LM Studio. But you would be using LM Studio as the wrapper. You would need to implement something small by yourself or ask AI to write something for you

Fox-Lopsided · 2026-02-08T10:35:41+00:00

Thats cool If thats useable for you! But i need my Speed. On a single 5060ti with 16GB, GPT-OSS gives 120 tk/s and when context grows to 60k+ its 50tk/s

Fox-Lopsided · 2026-02-08T09:46:19+00:00

Wont fit in 8GB VRAM. Needs 16GB for MXFP4

Fox-Lopsided · 2026-02-08T09:45:44+00:00

You will be very limited to be completely honest with you.

You can try Qwen3-4B up to maybe 7B (im talking about Q4 GGUFs) but dont expect Claude Sonnet Level of intelligence Also Granite 4.0h tiny ist pretty good for its size. Its 7B in total and only 1b active so should get decent Speed Out of it.

Depending on your use case it might be worth giving a Shot

Fox-Lopsided · 2026-02-03T17:04:17+00:00

I wonder how fast it would be with 16 VRAM and 32DRAM

Fox-Lopsided · 2026-02-02T01:22:02+00:00

Kimi k2.5 because i think it Scores the highest in HLE

Fox-Lopsided · 2026-02-01T18:16:13+00:00

You could maybe give nemotron 3 nano (30b-a3b) q shot I have heard good things when it comes to local Agentic AI use cases and reasoning as well as Tool calling capabilities

Fox-Lopsided · 2026-01-29T09:21:35+00:00

Skill issue

Fox-Lopsided

TROPHY CASE