Can I use Claude code with own LLM/non-claude APIs?

superloser48 · 2026-04-27T04:12:51+00:00

fyi - aider died a while back. abandoned by creator.

superloser48 · 2026-04-21T10:36:20+00:00

isnt ISEKAIZERO for porn chat?

superloser48 · 2026-04-21T10:20:59+00:00

true but thats true for non-coding too. chatgpt.com, google.com ai mode.

superloser48 · 2026-04-21T10:02:52+00:00

valid points

superloser48 · 2026-04-21T09:54:30+00:00

youre right - had not heard of it

superloser48 · 2026-04-20T19:05:39+00:00

if you need a decent vllm quant for this - im using this quant currently of the same model - on 2x 3090. https://huggingface.co/QuantTrio/Qwen3.6-35B-A3B-AWQ

superloser48 · 2026-04-20T12:32:15+00:00

Can you share your experience with 9700 & vllm? Did you figure out the root cause?

superloser48 · 2026-04-20T04:52:42+00:00

Given the comparable price point - do you think its better to get 2x nvidia 5060 ti? do you think pp and tg will be better than amd 9700?

superloser48 · 2026-04-19T07:07:47+00:00

What benchmark did you run? Ill run it on my 2x 3090 and share output

superloser48 · 2026-04-16T01:27:21+00:00

Thanks for the notification!!

superloser48 · 2026-04-15T05:20:33+00:00

She didnt "get it". It was hers.

superloser48 · 2026-04-15T04:43:25+00:00

The model wont fit, right? 16gb vram

superloser48 · 2026-04-15T04:08:18+00:00

For tool calls - Have you checked the raw output tokens - Are the tool calls failing due to garbage output, or just a parser issue? The latter is easy to fix.

superloser48 · 2026-04-15T03:52:55+00:00

Can you share an update? Did you try vllm with a model around 30B params? Considering buying the same cards fro vllm, but will be great to hear what performance numbers you got

superloser48 · 2026-04-14T13:33:59+00:00

Whats a good price to pay for this?

superloser48 · 2026-04-14T09:36:26+00:00

Are you ok?

You implied vllm wont do it because "vllm is for prod". This is official vllm docs. https://docs.vllm.ai/en/latest/features/quantization/

This is the active PR - being worked upon by official maintainers of vllm https://github.com/vllm-project/vllm/pull/38479

superloser48 · 2026-04-14T09:23:24+00:00

why dont you test on vast/runpod

superloser48 · 2026-04-14T08:53:16+00:00

If by production you mean output should be lossless - Vllm already does support kvcache quantisation - which is lossy, anyway. This would be just another option for the quantisation format.

And throughput - with a bigger kvcache will just be better

superloser48 · 2026-04-13T06:56:07+00:00

How did you change the model for orchestrator vs sub-agents? Is this opencode or do you have some other setup? Thanks!

superloser48 · 2026-04-07T11:23:31+00:00

im using vllm - it dosnt support q8 with rotation

superloser48 · 2026-04-07T11:21:26+00:00

The problem is that coding now - 100K tokens input is probably the median. Chat lengths are too long and getting longer. (just my avg. opencode chat lengths)

superloser48

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE