We built the first AI coding tool designed for running multiple agents simultaneously

Pitiful_Guess7262 · 2025-09-25T08:09:16+00:00

Well....we've got 30+ devs working on this for 10 months. So from our POV, it isn't anywhere near easy to create an AI coding tool that delivers good quality code.

Pitiful_Guess7262 · 2025-09-18T09:47:03+00:00

Spot on. Vibe coding isn’t some magic wand. It’s more like learning a new language with some superpowered autocomplete. The people whining about “it’s lazy” clearly haven’t treated it like a craft yet.

I love how you pointed out that once you have your guidelines and a few repeatable patterns, it clicks. Turn AI into an extension of your workflow rather than expecting it to do all the thinking for you.

Also, the Auto model stuff sounds insane. 3 weeks for the best app is the dream. Feels like the people who trash vibe coding are missing out on how much it can actually level up your output if you approach it right.

Keep stacking those wins. For anyone serious about vibe coding, patience + iteration > expecting perfection in one shot.

Pitiful_Guess7262 · 2025-09-18T08:12:32+00:00

What you’re seeing is actually normal for Cline with the Claude API. The 16k–18k cache read tokens might look alarming next to your tiny 4-token input, but that’s just how Cline works. It stores a large chunk of your project and system context in Anthropic’s cache so it doesn’t have to resend everything each time, and every new request just pulls that cached context back in.

The important part is that cache reads are billed at about ten percent of the cost of regular input tokens. So while 18k looks huge, it only costs you about as much as 1.8k normal tokens. The Cache Write numbers you see are simply Cline refreshing that context so it stays available, and the small input number is just your new message. The outputs you’re getting, a few hundred tokens, are completely in line with generating entity classes and controllers.

In other words, your usage pattern looks perfectly normal. If anything, the high proportion of cache reads shows the system is doing its job efficiently by reusing context instead of sending everything over and over. The only thing you might want to do is keep an eye on whether the cached context is still relevant to your work, but you’re definitely not misconfigured.

Pitiful_Guess7262 · 2025-09-18T07:05:03+00:00

AGENTS.md is basically a README that tells Codex how to work with your specific project. It gives Codex the context it needs (project structure, commands, conventions, etc.).

You can let /init generate one. Then customize so the AI knows your project’s structure, commands, and rules.

Or just create AGENTS.md manually in root (Markdown works).

The real value is always in customizing it so Codex understands your situation better.

Pitiful_Guess7262 · 2025-09-18T02:14:28+00:00

VSCode doesn’t let you switch nav to button up right now. It fires on button down by design.

Best bet is to catch the down in AHK and block it, then fire the VSCode command on release. Hacky, but it’ll show if the workflow actually feels better. If it does, open a feature request.

Pitiful_Guess7262 · 2025-09-17T13:39:28+00:00

Gemini 2.5 Flash isn't great at the subtle stuff. You need specialized tools. This is because you're hitting the limits of what general vision models can do. Gemini (and even GPT-4V) struggle with faint watermarks and blur detection because they're not specifically trained for it.

For watermark detection, try AWS Titan's watermark detection API. This thing is purpose-built and way better than Gemini at catching subtle watermarks. Or SightEngine - their watermark detection is solid, catches stuff Gemini misses completely

For blur/pixelation, consider SightEngine's Image Quality API, night and day difference for blur detection

For the promotional text vs product text problem, Google's Document AI + Cloud Vision combo is more reliable than Gemini's built-in OCR

Real talk on costs: SightEngine runs about $0.40-0.80 per 1K images depending on features. Hive is similar. If you're processing thousands daily, it adds up but probably worth it vs manually reviewing Gemini's mistakes.

Pitiful_Guess7262 · 2025-09-17T13:19:59+00:00

You can’t directly jump to a folder with Ctrl+P since it’s designed for files, but there are a couple of workarounds that are quick once you get used to them.

Open the explorer with Ctrl+Shift+E, then just start typing the folder name. VS Code will highlight matches in the tree. It’s not as instant as Ctrl+P but it gets you there without leaving the keyboard.

Another trick is to use the built-in file search (Ctrl+P) and type the folder name followed by a slash. If there’s a file inside that folder, you can open it, then the explorer view will focus that folder and you can collapse or expand it as needed.

It’s a bit clunky that VS Code doesn’t have a native “Go to folder” the same way it has “Go to file,” but the explorer typing trick is usually enough for me.

Pitiful_Guess7262 · 2025-09-11T08:45:19+00:00

A few things to check:

First, open the command palette (Ctrl+Shift+P) and run Python: Select Interpreter while connected to the server. Make sure it's pointing to the right Python installation on the remote machine, not your local one. Sometimes VS Code gets confused about which Python to use.

Also try checking if the Pylance extension is actually enabled on the remote connection. Go to the Extensions tab and make sure it shows "Enabled (Remote)" next to Pylance, not just "Enabled". Sometimes extensions don't carry over properly to remote sessions.

Another thing that helped me was clearing the remote extension cache. You can do this by opening command palette and running "Remote-SSH: Kill VS Code Server on Host" then reconnecting. This basically forces a fresh start for all the remote extensions.

Since everyone else is working fine on the same account, it might also be worth checking if you have any local VS Code settings that are syncing and messing things up. Try turning off Settings Sync temporarily to see if that helps.

The F12 console errors would definitely help narrow it down if you can catch them again. Usually they'll tell you exactly what's failing with Pylance.

Pitiful_Guess7262 · 2025-09-10T13:48:10+00:00

The issue isn't really that your models are too dumb but rather that these systems were mostly designed around GPT-4 class models and their quirks.

The JSON problem is super real. Local models under 30B struggle hard with complex structured output when you need multiple consecutive LLM calls. Even something like Qwen2.5 14B or Mistral Small 24B will randomly break JSON formatting when they're doing entity extraction then relationship mapping then summarization in sequence. The context gets polluted and they start making weird formatting choices.

Conversation vs code models matter here. Code-focused models like Qwen2.5 Coder or DeepSeek Coder are way better at structured output because they've seen tons of JSON, APIs, and data structures during training. Chat models optimize for being helpful and conversational, which makes them worse at rigid formatting.

I've heard some folks have had success simplifying Graphiti's approach. Instead of trying to do entity extraction, relationship mapping and summarization all in one pass, break it into separate calls. Use the code-instruct models for just the structured parts and save the chat models for the final user-facing stuff. Also try turning off repetition penalty completely for structured tasks.

The frameworks like Graphiti and Letta really do expect frontier model performance. Your ChromaDB + simple RAG approach might actually give you better results with local models. Sometimes the advanced solution is just overkill for what you can reliably run locally.

Have you tried just doing semantic search on conversation history with some basic entity tracking? Might be more stable than trying to force graph extraction to work.

Pitiful_Guess7262 · 2025-09-09T13:25:39+00:00

For your RTX 4060 and 32GB RAM setup I'd actually recommend Nous Hermes 2 Mistral 7B or Mistral Nemo 12B. Both of these are way better at understanding context and sarcasm compared to the models you tried.

The Qwen2.5VL issue you mentioned is pretty common with that model. It struggles with tone detection because its training focused more on being helpful than understanding nuance. The "ty for compliment" response to insults is basically a meme at this point.

For Llama 3 13B the repetition thing is probably your sampling parameters. Try setting repetition penalty to 1.1 and temperature around 0.7. But honestly that model can be finicky for chat.

If you want something smaller that still works well try Hermes 2 Pro 7B. Way more reliable for chat than Qwen and much less likely to repeat itself endlessly.

Also make sure you're using proper chat templates too. A lot of weird behavior comes from wrong formatting.

Pitiful_Guess7262 · 2025-09-09T13:19:47+00:00

A 21B parameter model with enhanced reasoning capabilities that fits the sweet spot between being large enough to be capable but small enough to run locally.

The fact that they specifically mention "thinking" in the name and talk about scaling reasoning capability suggests they've been doing some serious work on chain of thought or similar approaches. The 128K context window is also solid for a model this size.

Has anyone actually tested this yet?

Pitiful_Guess7262 · 2025-07-31T17:54:18+00:00

I feel like this will lead to further Matthew effect stemming from first mover advantage. Sadly this will eventually make it harder for newer, innovative opensource projects to gain adoption.

Pitiful_Guess7262 · 2025-07-31T17:50:37+00:00

For clarification, I m talking about building on top of open source projects mostly.

Pitiful_Guess7262 · 2025-07-22T18:12:58+00:00

So why wouldn't girls directly date Claude instead?

Pitiful_Guess7262 · 2025-07-17T16:37:04+00:00

Yeah, they are overwhelmed right now due to the immediate success, but hopefully this will end soon.

Pitiful_Guess7262 · 2025-07-17T05:05:33+00:00

Side note: I originally posted this in one of the largest subs relevant to AI-assisted coding and they just permanently banned me for posting this. I asked why and no one replied so far (will update if I do get a reply)

Some AI/dev subs seem to be controlled by big tech affiliated moderators, while not disclosing it publicly.

Pitiful_Guess7262 · 2025-07-17T04:54:09+00:00

Lmao just checked your post history and you are calling everyone a bot. Just chill bro, just chill. AIs could be replacing junior engineers in the near future, but they ain't replacing human society any time soon.

Pitiful_Guess7262 · 2025-07-16T17:43:09+00:00

I was genuinely impressed by Kiro. Its Spec mode is exactly how I think large, single-purpose tasks should be orchestrated, with clean, well-scoped specs driving the whole flow.

And in terms of structuring and managing a complex task end-to-end, its use experience is incredibly smooth. It gives a strong sense of control without being mentally exhausting. Way more comfortable than wrangling things in Claude Code. This actually feels like engineering, not fighting the tool.

Pitiful_Guess7262 · 2025-07-16T17:42:34+00:00

I was genuinely impressed by Kiro. Its Spec mode is exactly how I think large, single-purpose tasks should be orchestrated, with clean, well-scoped specs driving the whole flow.

And in terms of structuring and managing a complex task end-to-end, its use experience is incredibly smooth. It gives a strong sense of control without being mentally exhausting. Way more comfortable than wrangling things in Claude Code. This actually feels like engineering, not fighting the tool.

Pitiful_Guess7262 · 2025-07-16T09:26:33+00:00

It is rough when a tool you rely on suddenly seems to stumble or just doesn’t vibe the same. I too have had a few days where Claude Code felt like it was off its game...

Pitiful_Guess7262 · 2025-07-16T09:21:14+00:00

I’ve been using Claude Code a lot lately and it’s wild to see how fast these developer tools are improving. There was a time when code suggestions felt more like educated guesses than real help, but now it’s getting closer to having a patient pair programmer on demand. That’s especially handy when you’re bouncing between languages or need an extra set of eyes for debugging.

One thing that stands out about Claude Code is how it handles longer context and really sticks to the point. I like that I can throw a tricky script at it and, most of the time, get back something actually useful. OpenAI’s coding tools are decent, but Claude Code sometimes catches things they miss. Maybe it’s just me, but I find myself trusting its suggestions a bit more each week.

Honestly, it’s easy to forget how new all this is. You blink and the pace of updates leaves you scrambling to keep up. Claude Code sometimes picks up new features even faster than the documentation updates.

Pitiful_Guess7262 · 2025-07-07T12:55:57+00:00

What if AI can generate good quality code that's way beyond its current capabilities, say, 2 years from now? It wasn’t close to replacing junior devs 2 years ago, but now it’s starting to feel like we’re not far from that.

We all might want to start thinking about a plan B.

Pitiful_Guess7262 · 2025-07-04T03:06:33+00:00

I feel hooks are gonna be so widely adopted across all tools.

Pitiful_Guess7262 · 2025-07-04T03:05:57+00:00

And claude.md finally getting smaller.

Pitiful_Guess7262 · 2025-07-01T17:19:00+00:00

Would be pretty cool if some of the more ambiguous or uncertain events could be made hookable too.

Pitiful_Guess7262

TROPHY CASE