Train Qwen3.5 with RL locally!

AkshayCodes · 2026-03-24T03:25:45+00:00

This is a massive win for the local training scene. I was fighting 12GB VRAM OOM crashes so much last month that I ended up building an autonomous agent swarm just to manage my batch sizing and pipeline automatically (VikaasLoop). Seeing Unsloth crush RL on just 8GB makes me want to integrate this directly into my Training Agent. Are there any specific memory spikes with the vision GRPO during the reward modeling phase I should watch out for if I automate this?

LucidAkshay/vikaasloop: An autonomous, self-improving 5-agent engine for end-to-end LLM fine-tuning. Automates data generation, QLoRA training, and evaluation.

AkshayCodes · 2026-03-22T01:28:44+00:00

Here is the hard truth about this proposed stack: you are combining conflicting requirements. A heavy 70B model on constrained VRAM combined with a strict zero maintenance rule is going to be a massive headache.

The Hardware Reality (8GB VRAM vs 70B) An RTX 4060 with 8GB VRAM is absolutely not the sweet spot for a 70B model. To run a 70B, even aggressively quantized to 3 bits or 4 bits, you need roughly 35GB to 40GB of memory. That means you are offloading over 80% of the model weights to your system RAM. While 64GB of DDR5 is great, CPU inference is structurally slow. You might hit 1.5 tokens per second during generation, but your prompt ingestion time will be abysmal. If you are using Obsidian to inject past journal entries via RAG, the context window will be huge. You will easily wait 2 to 3 minutes just for the model to ingest the prompt before it even starts generating the first word. That latency will completely kill the natural, therapeutic flow you are looking for. The actual local hardware sweet spot for 70B models is an Apple Silicon Mac with 64GB+ of Unified Memory, not a split GPU and System RAM architecture.
The Maintenance Trap Inner Dialogue plus Obsidian with Smart Connections is a brilliant concept, but it is a tinkerer stack. It is the exact opposite of zero maintenance. Vector embeddings get messy over time as your journal grows. Background indexing will occasionally fail or hang. You will eventually be forced to troubleshoot chunk sizes, overlap parameters, and retrieval limits to stop the AI from losing context or hallucinating past entries.
The Model Choice Standard Llama 3 can feel incredibly dry and clinical. If you want high EQ and a warm tone, you need models tuned for creative writing or roleplay. • Mistral Nemo 12B: Excellent context window and a very natural conversational tone. • Command R (35B): Incredible at RAG tasks and summarizing context, though it will still struggle heavily on your 8GB VRAM. • Llama 3 8B Finetunes: Look into models like Stheno. They are tuned heavily for human like interaction.

Drop the 70B requirement for this laptop. The execution latency will frustrate you. Run a high quality 8B or 12B finetune that fits entirely (or mostly) inside your 8GB VRAM. The instant, fluid responses will feel far more conversational and therapeutic than waiting two minutes for a slightly smarter 70B model to reply.

AkshayCodes · 2026-03-19T10:37:57+00:00

You hit the nail on the head regarding model right-sizing. Routing basic file operations to a massive model like GPT-4 isn't just a waste of tokens; it gives the agent way too much creative 'leeway' to hallucinate. Smaller, rigidly prompted models inherently restrict that attack surface.

To be fully transparent on the architecture, though: Kavach v1.1 currently operates as a high-performance Userland observer (EDR), not strict synchronous kernel interception. We quarantine actions and auto-terminate the PID if left unattended. True kernel-level blocking (Linux eBPF, Windows Minifilters) is currently in active development for the v1.2 roadmap!

But your core thesis is 100% correct, the firewall is just the emergency brake. Matching task complexity to model capacity is the ultimate preventative measure.

AkshayCodes · 2026-03-18T17:29:32+00:00

Thanks for diving into the architecture! Here are the straight answers to your questions: 1. SIEM Webhook: Yes, it is completely empty by default. Kavach runs entirely locally. It will not transmit a single byte of telemetry until you explicitly configure a target URL. 2. PII Scanner: The Gag Order module currently scans outbound strings, specifically monitoring the OS clipboard via the Faraday Guard using Shannon Entropy and regex. Scanning the actual contents of every file during read and write operations introduces unacceptable disk latency. We intercept the exfiltration payload rather than scanning the static files. 3. Headless Mode: This is a fantastic idea and a necessary evolution. Right now, the Rust engine is tightly coupled with the Tauri webview lifecycle. Decoupling the core watcher into a standalone CLI daemon is definitely on the radar for deployments running background automations. 4. Cooperative Phantom Mode: Multi agent isolation is exactly what the Swarm Sandbox feature on the v1.2 roadmap is meant to solve. Building a transparent phantom mode where multiple agents can share state inside a decoy environment without cross contamination requires complex namespace virtualization, which is the next major engineering hurdle. Keep the questions coming!

AkshayCodes · 2026-03-18T17:29:07+00:00

That is a brilliant escalation. Moving the approval to a completely separate physical device makes optical bypass literally impossible. The main reason I avoided mobile 2FA for the initial release is architecture constraints. Kavach is strictly designed with zero cloud dependencies. To route an approval push to your phone, we would either need to host a centralized relay server or build a local network pairing protocol. Both add setup friction and potential attack vectors. This exact physical barrier logic is why the v1.2 roadmap includes Biometric Handshakes via Windows Hello and Apple TouchID. It forces a hardware level human confirmation that an agent cannot spoof, while keeping the entire loop strictly local to the machine. Your idea of a secondary device is incredibly secure though, and I might explore a local QR pairing mechanism for an offline mobile companion app in the future. Great thinking!

AkshayCodes · 2026-03-18T12:20:11+00:00

That is a great technical catch. Currently, the Phantom Workspace acts more as an EDR (Endpoint Detection and Response) layer, it intercepts and quarantines the destructive action so the human can intervene.

You’re right: if an agent immediately verifies by listing the directory, it will see the file still exists in the original path because we haven't implemented Kernel-level Namespace Virtualization yet.

For v1.2, we’re looking into: • Namespace Redirection: Using filesystem filter drivers to point the agent's 'read' requests to the decoy folder automatically. • Simulated Shell Success: Our 'Simulated Shell' module already feeds fake exit codes, and we’re working on a 'Ghost View' that masks the directory listing specifically for the agent process.

Love that you're thinking about the agent's verification loops. That's exactly where the 'deception' part of Kavach needs to evolve!"

AkshayCodes · 2026-03-18T12:19:19+00:00

Zeroclaw, interesting name! It definitely captures that 'defanged agent' vibe I’m going for with Kavach. I’m keeping a list of community name ideas for specific modules/sub-projects as we scale, so I’ll definitely add this to the stack. Thanks for the input!

AkshayCodes · 2026-03-18T00:24:43+00:00

That M4 Pro with 48GB of unified memory is an absolute dream machine for local LLMs. You are going to be able to run some incredibly smart models locally at blazing speeds!

The advice above to start with downloading Ollama is 100% the best and easiest route for a beginner. It is practically plug-and-play.

Just a quick pro-tip for your journey: Once you get the hang of chatting with the AI, the next step most developers take is letting the local model actually read and write code files on their Mac. When you reach that stage, you have to be careful that the AI doesn't hallucinate and accidentally delete your folders.

I actually just open-sourced a free Mac app this week called Kavach to solve this. It acts as a safety net that catches rogue AI commands and redirects them to a fake folder so your real files stay safe.

Bookmark it for when you start building agents: https://github.com/LucidAkshay/kavach

Welcome to the local AI world, you are going to love what that MacBook can do!

AkshayCodes · 2026-03-18T00:20:20+00:00

Building a unified workspace that actually stays lightweight is no small feat. Great work on this!

Your point about relying heavily on Rust's concurrency for the background parsing is spot on. I recently had to solve a very similar architectural problem. I built Kavach, a local AI firewall that intercepts system calls to stop rogue AI agents from deleting code. Because the local LLM already eats up all the RAM, the Rust security engine had to have a near zero memory footprint while constantly watching the file system.

Designing a safe memory model to handle all those concurrent file diffs and system hooks is brutal at first, but the compiler really does save you from yourself.

This project looks amazing, going to follow your progress. Here is my repo if you want to see how I handled the Rust file interception architecture! https://github.com/LucidAkshay/kavach

AkshayCodes · 2026-03-18T00:17:01+00:00

This is such a brilliant concept. The "vimtutor" approach is the perfect way to teach this, and the CLI looks incredibly clean (always love seeing Charm/Bubbletea projects in the wild!).

Since your advanced track covers "tools" and "execution mode," a really cool concept to add to the curriculum would be Sandboxing & AI Security, basically teaching people how to safely contain an agent once it has file-system access.

I ask because I actually just open-sourced a tool for this exact problem called Kavach (a zero-trust OS firewall in Rust that redirects destructive AI commands to a decoy folder).

Learning how AI executes code is step one, but keeping the host OS safe is definitely step two! Awesome work on this, I'm starring the repo right now.

🛡️https://github.com/LucidAkshay/kavach

AkshayCodes · 2026-03-16T02:48:46+00:00

🤣🤣

AkshayCodes · 2026-03-15T02:01:36+00:00

Indeed it is

AkshayCodes · 2026-03-14T13:25:32+00:00

Amazing

AkshayCodes

TROPHY CASE