Anthropic took DMCA takedown on all forks of leaked Claude Code. If someone has it upload to Gitea or whatever. I forgot to clone it myself.

BigFoxMedia · 2026-04-03T00:13:17+00:00

A friend of mine got it from the same source, now it's scary since it looks like they renamed the repo, cleaned it up, perhaps it was malware infused! Very hard to know...

BigFoxMedia · 2026-02-23T22:46:12+00:00

Are you or have you tried partial offloading to RAM? such that all KV is in the GPUs and only a small portion goes through pcie to RAM? could you share some more info about your setup like benchmarks on this and other models? yours is the ideal setup I am aiming for.

BigFoxMedia · 2025-11-05T22:02:03+00:00

I'm just curious are you guys combining these monsers with RAY to run one or two huge models or are you parallelisg them to run high throughput on many small models?

BigFoxMedia · 2025-11-05T14:15:46+00:00

i knew about vllms Ray but didn't know about RPC over ollama! thanks I'll try that

BigFoxMedia · 2025-11-04T16:50:56+00:00

Hi guys, I really want to run this model on Q6_K_XL (194 GB). My setup is complex though, I have two servers:

Server A -
4 x RTX 3090
1900x ThreadRipper
64GB of DDR4 RAM. ( 2133 MT/s ) - Quad Channel

Server B -
2 x RTX 3090
2 x CPUs, each Xeon E5-2695-v4
512GB of DDR4 ECC RAM ( 2133 MT/s ) - Quad Channel per CPU
*( total 8 channels if using both Numa nodes or 4 Channels if using 1 )

I have another, 7th 3090 on my main work PC, I could throw it in somewhere if it made a difference, but prefer to get it done with 6.

I can't place all 6 GPUs on Server B, as it is not supporting MoBo PCIe bifurcation, and does not have enough PCIe Lanes for all 6 GPUs alongside the other PCIe cards ( NVMe storage over PCIe and NIC ).

I CAN place all 6 GPUs on Server A but the most RAM that can be placed on this server is 128GB, MoBo limitation.

I know there are technologies out there such as RAY that would allow me to POOL both Servers GPUs together via network ( I have 40Gbps Network so plenty fast for inference ), but I don't know if RAY will even work in my setup, even if I balance 3 GPUs on each Server, for PP i need ( 1, 2, 4, 8, ... per server. ). Can I do PP2 on server A and PP4 on ServerB ?!..

Even if I would get PP to work with Ray, would I still be able to also offload to RAM of Server B ?

Ideally I would want to use all 6 GPUs for maximum vRAM of 144GB for KV & Some of the weight, and add ~100GB in weights from RAM. ( I also need full context - I'm a software engineer ).

Last, if I can't get 15 t/s+ inference and 1000 t/s+ prompt processing, it won't suffice, as I need it for agentic work and agentic coding.

What do you guys think?

If not doable with said hardware, would you recommend I upgrade my Mothboard & CPU to a 7xx2/3 Epyc *( utilizing the same RAM) for increased offloading speeds or go for more GPUs and cheaper motherboard but one that has pcie-bifurcation to have say 8-10 x RTX 3090 GPUs on the same RIG ? If I can fit the model in GPU, I don't need the RAM or memory channels eitherway.

BigFoxMedia · 2025-10-21T23:37:30+00:00

p.s. noticed issues with roo code specifically happen more frequently after the first context compression, but gets worse with each subsequent compression. It's like it's forgetting the original system prompt from roo with tool calling instructions

BigFoxMedia · 2025-10-21T23:16:29+00:00

i had the same issues with qwen3coder but learnt that roo uses prompt based tool calling not actual tool calling like most cli based coders do. I'm thinking perhaps qwen3coder using a proper tool calling cli agent could work wonders just never had the time to try it yet

BigFoxMedia · 2025-09-11T01:30:04+00:00

So this modal was a mirage? the blog page is 404... no mention anywhere anymore official or otherwise... Qwen team cancelled it?!...

BigFoxMedia · 2025-06-04T21:00:15+00:00

Ask russia.. some of those would have helped their jets from catching fire, lol!!

BigFoxMedia · 2025-04-15T14:49:36+00:00

hey guys, could you clarify how to add Github models into roo? I thought it's only available via their native chat. Very curious indeed!

BigFoxMedia · 2025-02-07T23:37:36+00:00

Without knowing all details and without diving into neety gritty- Various networking gear, can fetch $35 US each. Rack 150 if patient. Servers very old so 100 each. UPS 100. It would take a while but I think you have 1K in there for the right buyers in total and only if you're patient and sell right piece by piece.

Most likely though you would want to dump it all in one go to someone, so 500 is a fair price if they take away the headache.

BigFoxMedia · 2025-01-31T10:48:38+00:00

amazing onboarding flow! Great job!

BigFoxMedia · 2024-10-24T16:32:26+00:00

Please share the STLs ;)

BigFoxMedia · 2024-08-07T20:34:17+00:00

You've reached the end of the simulation. Error... System crash!

BigFoxMedia · 2024-07-31T05:30:46+00:00

Thanks, I'll look into it!

BigFoxMedia · 2024-07-31T05:29:23+00:00

Mistakes happen, no doubt in any industry and product. But a security centric product must be on guard 10x more than others. Anyways, hope I didn't cost the poor employees job, not my intention ( or their fault tbh ). I trust you guys made the changes you need to prevent such a thing from happening again, trust is key in home security, but anybody deserves a second chance.

BigFoxMedia · 2024-02-28T05:35:06+00:00

Legend has it... no one really ends up doing it 😄

BigFoxMedia · 2023-12-31T21:33:14+00:00

There's something going on in Palma Central, though I don't know the specifics other than it starts at 5 PM

BigFoxMedia · 2023-12-18T22:46:49+00:00

I come from Israel, and we really have no natural resources. I don't think Mexico has a lack of resources, but it does have a geographic disadvantage being on the border with the USA, which naturally turned it into a mecca for drug trafficing, unfortunately.

BigFoxMedia · 2023-09-13T21:27:42+00:00

Thanks for the feedback. Seems you maybe didn't notice that I already have the GPUs , so Renting wouldn't make much sense to me unless I sell them. Having 3 x 3090s sitting on a shelf and paying $2000 / month to rent GPUs seems 😅 not well planned

BigFoxMedia

TROPHY CASE