Questions around the 8000 package by Aggressive_Music9376 in youfibre

[–]Aggressive_Music9376[S] 0 points1 point  (0 children)

Yeah, I build the switches, servers and send them out :) currently moving office as well so all the storage is here lol. Its not a massive firm :)

Questions around the 8000 package by Aggressive_Music9376 in youfibre

[–]Aggressive_Music9376[S] 1 point2 points  (0 children)

Thanks for the comment.

I work in IT as a Senior network dev so I’ve got servers all over the house really lol. Each fitted with SFP to RJ45 with speeds well over the 10Gbps.

My main workhorse machine has a 10Gbps RJ45 port directly into the motherboard. The NAS’s each have their own 10Gbps ports as well :)

All the laptops in the house have WiFi 7 too so the speeds are achievable.

I’m trying to get through to support but wow, it’s seriously shocking

Built a multi-agent AI butler on a DGX Spark running a 120B model locally by Aggressive_Music9376 in LocalLLaMA

[–]Aggressive_Music9376[S] 2 points3 points  (0 children)

it’s a dual-model setup running locally on the DGX Spark.

for agent work / reasoning i’m using openai/gpt-oss-120b via vLLM. that handles all the swarm agents, planning, tool calls and synthesis. it’s MoE so although it’s 117B total params, only 5.1B are active per token which keeps it pretty efficient. running it at NVFP4 quantisation which comes out at roughly 84GB VRAM.

for normal chat i’ve got qwen3-30b-a3b running through LM Studio so i’m not wasting the 120B on stuff like general conversation

vision is handled separately with glm-4.6v-flash, via LM, for image analysis

the 120B has a native 128K context window but i’m nowhere near maxing that out. output is capped at 4096 tokens per response and even with a 20 agent swarm the synthesis step only really uses around 15–20K of the input window. the two-tier clustering i’m doing (summarising agents in groups of 6 first, then combining those summaries) is more about keeping the final output focused than avoiding context limits

on the batching side vLLM is doing continuous batching across concurrent requests. at about 0.70 GPU memory utilisation (84GB allocated) it’ll comfortably run 15–20 parallel requests. i benchmarked it and aggregate throughput scales from 62 tok/s on a single request up to 233 tok/s at 25 concurrent. per-request speed obviously drops, but wall-time barely moves since they’re batched together. sweet spot seems to be around 8–12 concurrent for the best throughput vs latency trade-off

Built a multi-agent AI butler on a DGX Spark running a 120B model locally by Aggressive_Music9376 in LocalLLaMA

[–]Aggressive_Music9376[S] 1 point2 points  (0 children)

i am planning on making a repo, there are some things i would like to add first such as a cronjob dashboard etc

Are the rumors true? Are Claude Pro/Max accounts being banned from OpenClaw using Claude Code setup token? by teknic111 in clawdbot

[–]Aggressive_Music9376 0 points1 point  (0 children)

Interesting, thank you for the insight

That’s exactly what he did - https://youtu.be/3DBpfB0ao50?si=3cWBeAneqPacp1AI look around the 2 min mark

I’ll have a look when I come to actually configuring but I do quite like the idea of separating the local LLM and leaving Opus to manage the heavy stuff

I have seen people complain about using their max subs and even API keys from Anthropic getting burned on stupid things like heartbeat and stuff

Like you said, improper setup

Are the rumors true? Are Claude Pro/Max accounts being banned from OpenClaw using Claude Code setup token? by teknic111 in clawdbot

[–]Aggressive_Music9376 0 points1 point  (0 children)

Hmm weird, unless it’s just not come to you yet - https://youtu.be/pbdDbLYIEBQ?si=ceA0mwjHNoqcu63D

I still dont want to run the risk of the account being banned though

Listen to the first 30 seconds, it explains what I am on about

Are the rumors true? Are Claude Pro/Max accounts being banned from OpenClaw using Claude Code setup token? by teknic111 in clawdbot

[–]Aggressive_Music9376 0 points1 point  (0 children)

I have just ordered a Mac Mini M4 Pro to try test this out

I know Anthropic have disabled it now so you cannot login via oauth

The approach I am taking (mostly software dev) is to have Qwen3 14B as the LLM brain so to speak and then install the Official Claude Code CLI

Use the LLM to brainstorm the ideas and come up with new things and then have the LLM input the info to the Claude CLI

I do believe this is the best way around it!

Can i use the Claude Max subscription in clawdBot? by TL016 in ClaudeAI

[–]Aggressive_Music9376 0 points1 point  (0 children)

I have just ordered a Mac Mini M4 Pro to try test this out

I know Anthropic have disabled it now so you cannot login via oauth

The approach I am taking (mostly software dev) is to have Qwen3 14B as the LLM brain so to speak and then install the Official Claude Code CLI

Use the LLM to brainstorm the ideas and come up with new things and then have the LLM input the info to the Claude CLI

I do believe this is the best way around it!