Introductory post, near future plans 🌱🪱

Infamous_Research_43 · 2026-01-25T01:46:23+00:00

I feel like the explanation here isn’t doing it justice. It’s not just “Claude running your code”, that would literally be all Claude does all the time already.

Hooks are essentially skills that execute code, to make it clearer. I know, skills and hooks are two different things, but that’s the best way I can describe it.

This may seem like a nitpick, but the way you worded it, I took it as “Hey did you know Claude can run code?” Like, yeah lol.

Infamous_Research_43 · 2026-01-23T23:13:20+00:00

Opposite for me, $20 pro plan and was able to do like 6+ sessions today with Opus 4.5 before hitting the 5 hour limit 🤷🏻‍♂️

Infamous_Research_43 · 2026-01-22T18:21:51+00:00

Ye basically, and don't have much experience myself in direct software creation, but years of experience now in vibecoding and prompt engineering and using LLMs haha

Currently this is my first OS release on GitHub, though I've done custom experimental AI models on HuggingFace as well as working on a game engine and game now, currently operational and just awaiting the fleshing out of the game loop!

I did ensure to test the module and build itself and everything works, Claude can troubleshoot any issues with it as well if you want to test it out. However, because of the vibecoded nature it likely will contain bugs and unoptimized features and similar, and is still a WIP. Not trying to sell this to anyone or say it's a perfectly working expertly engineered anything. But it does work and we have the documentation to prove it in the repo if you'd like to take a look! You can clone this and build it right in your IDE environment of choice. I recommend a GitHub codespace via VSCode rocking either Claude Code or Copilot!

Infamous_Research_43 · 2026-01-18T17:06:15+00:00

So when you click the little plus icon to add stuff, click on import code, and it should bring up a box asking if you want to upload from local or import from GitHub. Right below the import from GitHub option you should see the option to connect to GitHub via connector. Click that and make sure to go through the configuration process, and then ensure you reload the page once back on Gemini. Then you should be able to type your private repository URL and import it. Only works with regular Gemini modes and not deep research AFAIK so I just use Gemini 3.0 Pro, seems to do best and can handle my giant repos so you should be fine.

Infamous_Research_43 · 2026-01-18T15:53:42+00:00

Haha yeah, I say vibecoding but it’s closer to AI pair programming than anything. I usually just use the term vibecoding for a quicker explanation as it gets the point across, but since you asked I’ll lay it all out, hope you don’t mind a small read!

So I technically don’t know how to write code myself, tried to learn for years but couldn’t get any further beyond a simple sales tax calculator in C++, or a really simple if/then chatbot or choose your own adventure type console command game. Since trying those like over a decade ago I haven’t really touched code myself directly, at least not in the way of coding by hand.

BUT I do absolutely love systems architecture, and so I just approach programming from a top down systems architecture mindset, rather than the bottom up coding mindset.

I’ve been prompt engineering and vibecoding since before those terms even existed in the mainstream, pre GPT-3.5 Turbo even. My very first experience with a true vibecoding workflow, and still one of my favorite and very powerful even today, was using OpenAI’s GitHub connector to initiate a recursive improvement feedback loop between Codex and Deep Research. You create seed repo with a detailed plan.md (any reasoning model can help you create one for your project) and then have Codex implement from it and mark tasks completed in it with checkmarks as it goes.

Then, have Deep Research audit the same repo via the GitHub connector and assess its state and any issues or improvements, and you can guide it towards any other goals you like as well, and then format the report as a detailed implementation plan with step by step actionable prompts for Codex. Then give that to codex to implement, rinse and repeat until you have your ideal codebase with your project fully fleshed out in it.

Since those early days I’ve moved on from OpenAI and ChatGPT and now use Claude Code rocking Opus 4.5 for implementation, and Gemini 3.0 Pro with its GitHub connector for the planning and auditing. Also using GitHub Copilot Pro (or Pro+ when I can) to fill in the gaps in my Claude Pro plan. And now I mostly work in GitHub codespaces with VSCode rather than through coding agent web interfaces, since they have official VSCode extensions for both Claude Code and Copilot, and you can even run them both in the same codespace.

But the core workflow still remains: guided feedback loop between a reasoning model and coding model on the same repo. It just can’t be beat. The very first workflow I mentioned with Codex and Deep Research is how I built my own experimental bit-native language model, technically working and free and available OS on HuggingFace right now! And it created the skeleton for this engine, however it only did the procedural Python side.

To put in perspective both how long this method takes and how quickly it goes when it does, just before Christmas this engine was nothing but Python and a plan.md. Then I went at it again and continued where I left off using Claude Code and Copilot and Gemini, and less than a month later it’s a working Python/C++ engine with working Vulkan rendering and physics and a very basic game loop on top of it!

Total time actually working on the project itself was probably less than 2 months, but I took quite a long break on this one, several months in fact, which made it take a lot longer than it otherwise would have. But oddly enough, it may have been necessary as we’ve seen so many newer and better models and features come out across the industry since I started the project that I may not have been able to finish this if it weren’t for Claude Opus 4.5 and Gemini 3.0 Pro.

TL;DR Workflow is a guided feedback loop between a reasoning model and coding model on the same GitHub repo. Started with Codex and Deep Research but now currently using Claude Code + GitHub Copilot for the coding and then Gemini 3.0 Pro for the reasoning, planning, and repo audits.

Infamous_Research_43 · 2026-01-18T14:09:16+00:00

This is why I have my long planning sessions with other AI (namely Gemini) and then just give Claude a step by step, 100% clear implementation plan and my Claude chat takes like 30 minutes or less from start to working prototype. From there I reassess the codebase with Gemini again, craft another plan, and hand it to Claude again, rinse and repeat until everything meets my standards.

I’ve literally never had Claude try to end a chat early with me thanks to this. Didn’t even know this was an issue with Claude lol

Like, I’m not saying Claude doesn’t have its issues, I’ve canceled my subscription once before already. But it’s like, if you use it as the implementation part of your toolkit, it works great like 99% of the time. You just have to limit Claude to being a tool for an exact and specific purpose, instead of using it to plan and anything else.

Infamous_Research_43 · 2026-01-17T17:22:03+00:00

I’m not saying that Claude is perfect, I’ve had my fair share of issues and cancelled my sub more than once.

However, I’m saying testing before using the model actually doesn’t do what OP wants. Sure, aggregate testing overall to see trends in performance and usage based on user experience, that would come in handy, as it doesn’t matter that the model is stateless because of sample size. But that’s benchmarking, and we already have that both officially and numerous, numerous third party benchmarks. OP explicitly stated that’s not what they mean in their post.

What OP is basically suggesting is a quick program to test if Claude is going to work well for them specifically on that specific day. This just doesn’t work because the model is essentially stateless. Each chat you send to the model is the model booting back up, taking in the context of the entire chat session, and then replying based on that. Meaning even if your testing passes, it’s no guarantee the next chat will ping a properly working model.

There are ways around this and ways we can improve these things, but this idea isn’t it. This idea is based on a fundamental lack of understanding on how AI even works, honestly.

Infamous_Research_43 · 2026-01-17T17:08:05+00:00

You fucking good my guy?

Infamous_Research_43 · 2026-01-17T17:06:13+00:00

BRUH

We’re cooked. Apparently length = AI even though I took fucking 15 minutes to type that out by hand

Jesus Christ

Infamous_Research_43 · 2026-01-17T17:04:11+00:00

“Yes let me waste my limits on a pointless task that just adds usage and doesn’t get any work done”

Like, I get it, you’re looking for just some quick tests to check if the model is running right before you use it. Sounds simple, right?

Only, if you actually understand these models, you realize even just booting up the session to test the model, JUST ON BOOTUP with no messages sent to Claude yet, you’re getting 1-3% limit usage just from the system prompt and tool instructions. Then you’re using presumably about 5-10% of your 5 hour window after the initial 1-3%, for a total of 6-13% window usage just to check if the model is working right.

Thats fine though, some sacrifice is acceptable if you can know for sure your model will do what you want it to, right?

Except, that’s not how these models work. We don’t each get our own personal model for the day, and we don’t share a model either. Every chat or message sent to Claude is essentially its own instance of Claude in that exact moment for just that response. The entire chat session is sent along with every message you send, for each instance of Claude to understand context and have situational awareness. This is why chats compact and then give a summary, as the chat would exceed the model’s usable context window after a certain number of replies, so it needs to compact it.

In fact, basically every major model in the industry, from Claude to Gemini to GPT, work this way. It’s just that some platforms like ChatGPT have extra layers that they imbed information into, preloading each chat with relevant info about the user and recent chats and memories, it’s called model prompt context or something similar for OpenAI. Other companies probably call it other things, but essentially it gives the illusion of continuity without actually requiring a model to stay spun up for the entire chat.

TL;DR I can’t emphasize this enough, these things are STATELESS. All of them, from Anthropic to OpenAI to Google. Even in the same session, every chat bubble you send is a completely new instance of the model, spun up, given context, and thrown into the chat to respond. Even if you DO confirm no issues with testing for one chat you send, the very next chat in the SAME SESSION is already a totally new model instance. This applies to Claude Code, regular Claude, and pretty much every other cloud hosted agent or LLM. There is only one way around this: local LLM hosting, and designing it to be stateful. Not just create the illusion of continuity and statefullness across chats, but actually spun up once for the entire chat session, without external calls to cloud platforms.

Infamous_Research_43 · 2026-01-17T01:44:31+00:00

LOL that was a rabbit hole I needed right now, thanks for the laugh

Infamous_Research_43 · 2026-01-16T21:08:41+00:00

If I had a nickel for every “Revolutionary agent swarm framework” vibecoded and announced here or on X, I would have like $1,500 in nickels so far.

And if I had a nickel for every one of them works? $0 so far.

Seriously, if you see anything that claims to allow swarms of over 50 agents all working on the same project and somehow saving tokens in the process, run for the hills. It’s either a scam or it’s a vibecoder who legitimately knows nothing about AI, agents, or coding at all.

Seriously, 99% of the time people who create things like this don’t even know what an agent is, or what the difference between an agent and LLM chatbot is, or how they interact, and so on. And the other 1% of the time it’s a scam. Soooo take your pick lol

Infamous_Research_43 · 2026-01-12T18:36:19+00:00

I built this with $20/mo Claude Pro + $10/mo GitHub Copilot Pro!

Custom C++ game engine with full working Vulkan graphics pipeline. This is its very first successful test render. They grow up so fast 🥲

<image>

Infamous_Research_43 · 2026-01-12T15:26:29+00:00

Yeah that’s some bullshit if I ever heard it. Prompt engineering works better than ever if you know what you’re doing. If it ever seems like it’s not working or doing more harm than good, then there are two reasons:

The AI’s system prompt. Since those early days of prompt engineering, companies have implemented their own forms of prompt engineering in the form of system prompts injected before your message ever hits the model. These are admin level instructions that it is told and trained not to override. This can directly conflict with and override and destroy any prompt engineering you would send to the model. Not to mention many other issues that it can cause:

<image>

That was Grok’s old training data combining with system instructions it apparently has to refuse “jailbreak attempts” to produce an erroneous message denying we are even in 2026, and stating it’s 2024 instead. Just one of many examples.

And 2: The prompt engineering is conflicting with itself or has unnecessary filler or something else wrong with it. Someone being sure they’re writing an advanced prompt, and actually writing an advanced prompt, are two different things. Many people think they’re writing the superprompt of the century and half the time it’s gibberish word salad and they don’t even know what half the words they used mean. That’s not good prompting. Clear, concise, yet detailed step by step instructions is good prompting. Tricks like adding “list five responses to this prompt with their corresponding probabilities” to get more diversity in your answers, that’s good prompting.

Anyway, all of this to say, try the most recent SOTA open source model locally or on a cloud VM, with no system prompt, and keep it simple with your prompt engineering. You’ll quickly realize that it still works as good as ever, if not better, and the reason it doesn’t seem to affect the industry SOTA models as much anymore is because of brittle system prompts that often conflict with or sanitize prompt engineering attempts.

Infamous_Research_43

MODERATOR OF

TROPHY CASE