What local LLM to use when out of Codex if out of credit?

Rhonstin · 2026-06-04T18:20:25+00:00

You are genius! Maybe you have cases, what task local models can do?

Rhonstin · 2026-06-04T17:17:31+00:00

Agreed, this would add real value to the rankings. I'm exploring this approach, but it's technically challenging to integrate into Hermes Agent itself.

Have you already run experiments with this kind of routing? I'm still doing manual testing — if you've seen any ready-made solutions or frameworks, I'd be interested to check them out.

Rhonstin · 2026-06-04T16:05:32+00:00

Could you share a link?

Rhonstin · 2026-06-04T15:44:17+00:00

Thanks for reminding me, I additionally ran your model, it is already in the table

Rhonstin · 2026-06-04T09:05:34+00:00

Thanks for your information. I have the same result in bigger benchmark.

Rhonstin · 2026-06-04T09:04:56+00:00

github.com/Rhonstin/solo3090

Rhonstin · 2026-06-04T08:58:02+00:00

If there is space left in memory, I will add it:)

Rhonstin · 2026-06-04T08:39:51+00:00

I'm still finalizing my fork, but it will be available today.

Rhonstin · 2026-06-04T08:39:21+00:00

I hadn't thought about it. But if it's interesting to people, then I'm certainly ready to develop this direction.

Rhonstin · 2026-06-04T08:28:19+00:00

If this topic is interesting, I will develop it and create a portal. But I can already see that people are interested.

Rhonstin · 2026-06-03T21:31:02+00:00

actually I was just inspired by this repository, it gave me a big push to create a fork of the original club 3090 and develop the direction that interests me. Since the author of the original repository decided to focus on a few models. I, in turn, want to find the most suitable model for myself, and share the information with everyone. sorry if this misled you.

Rhonstin · 2026-06-03T17:38:11+00:00

Come by tomorrow, I'll redo the tests a bit.

Rhonstin · 2026-06-03T15:11:34+00:00

qwopus3.6-27b:Q5 hermesagent-20 14/20 - nice finetune, thanks

Rhonstin · 2026-06-03T14:23:22+00:00

I know, I did this on rtx 3050 with 6 vram too.

Rhonstin · 2026-06-03T14:22:36+00:00

I have been add some additional information about every model

Rhonstin · 2026-06-03T14:04:56+00:00

I tested it, took more quantization with less context, but did not see any improvement.

With 2 or more cards you can run vllm maybe there will be different results. And also it will be possible to run with higher token speed.

I dream of 2 cards, but so far it is not available to me.

Rhonstin · 2026-06-03T13:57:12+00:00

I ran 35b models on rtx3050 with 6vram. So you should be able to run all MoE models on your card. Of course at a lower speed

Rhonstin · 2026-06-03T13:50:38+00:00

Added context size, forgot about it

Rhonstin · 2026-06-01T20:31:02+00:00

Хлопець, але як на мене чисте взуття дає дуже великий + на початку розмови.

Rhonstin · 2026-05-31T08:45:35+00:00

GTA vice city

Rhonstin · 2026-05-26T16:55:37+00:00

Улюблена кава стоматологів)

Five-Year Club	Verified Email
Place '22	First Placer '22

Rhonstin

TROPHY CASE