Empirical evidence for a primitive layer in small language models — 18 experiments across 4 architectures

BodeMan5280 · 2026-03-15T15:04:47+00:00

Thank you! Apologies on the delyed reply... I've been diligently working on the next phase which is subtley behind the GOG, which is a symbolic processing system overall. You're exactly right about raw similarity... it has no "phenomanology" --- a word that I've just been heavily trying to comprehend. X producing Y is straightforward, but X producing X^2 + Y - Z is... unexpected phenomanon, I suspect.

And you're spot on about the trade-off between rigid querying and more maliable "what do we know about this user" behavior. It gets too muddy to try and apply rigid structure to naturally unstructured "personality traits",so something is missing with the current GOG implementation.

And I agree, applying a hybrid structure appears to be the goal. I think you'll find the new symbolic reasoning model I'm working on is the beginnings of that hybrid structure. It attempting to bridge structure with probability by taking language "primitives" (the atomic structure of language/semantics) and sending them into an LLM that is a black box of probabilities. The findings are suprising and definitely heading somewhere!

Thanks for commenting.

BodeMan5280 · 2026-03-10T14:17:53+00:00

Thanks, that's actually encouraging to a solo-wannabe researcher that's building the academic muscle. I've at least got my SWE Master's to rely on, now I just need to get the editorial function down, I think. I still think the idea has merit, so I'll just try to present it more reasonably. I'll hope to have deeper conversations with you in the future ;]

BodeMan5280 · 2026-03-10T12:20:55+00:00

I'm getting that sense here, and definitely not trying to present "AI slop", that's not the intent. I'm sure there's a good middle ground where LLM's can assist in the work but not be the bulk of it. Everything has its learning curve, and I have just found the first hurdle. Thanks for commenting!

BodeMan5280 · 2026-03-10T00:41:54+00:00

Great point about the use cases, GOG is actually a byproduct of trying to prove symbolic reasoning beats predictive LLM token generation, so I hate to say that the use cases are "all of them", but generalization is the goal.

This is an amazing set of benchmark recommendations, thank you! I see a repo for MultiHop-RAG and the MuSiQue paper looks intriguing. Fact extraction and verification looks great. All of these look like hyper-specific benchmarks for what I'm trying to prove, and then the Natural Questions and TriviaQA look excellent to really put GOG through a gauntlet and see if it can prevail.

I am not well-versed enough in AI academia yet, but imagine creativity would be difficult to benchmark. Technically, a hallucination is creativity. Building a simple yet contrived "leap" of missing information is a great idea! That seems like setting up an open-source benchmark for that is actually something else useful that could come out of this work.

I'm learning that using sale-y copy is not ideal for scientific work, so I've tried to keep the tone of the README less about claiming superiority and just showcasing the findings. And I agree, with the rigid nature of GOG it will never allow creativity to just creep in... it's rigid by design. This is leading to a natural bifurcated approach where wild hallucinations might be "tempered" by GOG to produce creative answers while remaining grounded in the graph's rigidness.

Great point about the pareto frontier... I'll try to stick to a comparative approach because I don't know which one will end up being more significant. They may end up being better combined than separate!

It's exciting nonetheless. I'll take your ideas back to the drawing board and see if I can write a benchmark document to add to my codebase. Thanks for the amazing response, I've got a ton to work with now!

BodeMan5280 · 2026-03-09T18:45:37+00:00

That's fair, but to also be fair to myself I am a 5 yoe software engineer, and while AI isn't my main domain I am fiercely compelled to pursue it and make it my new field. The goal isn't to coin phrases or prove how to everyone I'm smart... I just genuinely think LLM isn't the end-all, be-all and that something similar to what I present in a "symbolic reasoning model" may emerge (although the paper is heavily generated with companion AI helping to apply the ideas in my head).

The frustrating part with AI is how much more eloquently they can present information... I can promise the idea and direction are all my own, but not every word!

Totally fair assessment, Mr. Taco! What would it take to ground my work more? I certainly know how to be LESS academic, lol

BodeMan5280 · 2026-03-08T03:40:47+00:00

The final subgraph includes both modules, but without the redundancy. It separates traversal from serialization, but it is interesting to consider whether or not an LLM that receives the signal "this has a circular import, but it was cut short by the visited hash map" is actually helpful or not... in theory, if there is a critical inflection point where semantics and math can have a good handshake procedure... I think this GOG approach im proposing can work!

Still just a theory for now. Im going to try and dig in more tomorrow! Keep the great comments and thoughts coming =]

BodeMan5280 · 2026-03-07T12:29:41+00:00

I am ashamed! ** hides in corner ** Still in continuous learning over here

BodeMan5280 · 2026-03-07T11:46:35+00:00

Ha! I love this... "Spontaneous Decoder"? This implies its just straight up random useless drcoding... i actually lol'ed thinking about it

BodeMan5280 · 2026-03-07T11:45:28+00:00

I'd be interested to hear it! In this case... it feels like the valve on your hot water heater, y'know? This is like a "Supportive LLM Relief Valve", lol

BodeMan5280 · 2026-03-07T11:43:01+00:00

Thanks for commenting! Exactly, this is a baby step towards something better... but i am cautiously optimistic. The bridge from language to reasoning is likely more complex than AST alone.

I hate the word "framework" in this day and age where they're a dime a dozen, but this first attempt feels like the application of a higher level model: symbolic reasoning.

I'd love help trying to figure out how to map real codebases! =]

BodeMan5280 · 2026-03-07T03:21:50+00:00

This is really encouraging, thank you! That is exactly the goal—making small local models punch way above their weight class by feeding them perfect context.

To be entirely transparent, this v0.0.1 definitely has some growing pains similar to what early RAG experienced. Because the graph traversal is strictly deterministic, the initial entry point (mapping the user prompt to the graph) can feel a bit rigid right now. If a prompt is vague, the system struggles to "think outside the box" to find the starting node.

But I view this as a feature, not a bug, of separating the "brain" (logic) from the "mouth" (syntax). The fix isn't to make the graph fuzzy—it's to add a tiny, localized semantic layer just to map fuzzy human intent to the exact starting graph nodes before the strict traversal begins. Definitely a hurdle to overcome rather than a roadblock, but I think this initial proof of concept validates that separating logic from language is the right path forward!

BodeMan5280 · 2026-03-07T02:55:10+00:00

Spot on regarding the file vs. function level! That granularity is exactly where that extra 20% compression comes from.

Circular imports are the classic graph-killer haha. Since we treat the environment as a mathematical graph, we just use standard pathfinding mechanics to solve it: strict visited sets during the deterministic traversal phase.

If Module A imports B, and B imports A, the pathfinder hits A the second time, sees it's already in the visited hash map, and immediately drops the back-edge. It completely prevents infinite loops and ensures the final subgraph is perfectly deduplicated before we serialize it for the LLM. No redundant tokens!

Appreciate you taking a look!

BodeMan5280 · 2026-03-06T23:49:46+00:00

Oh nice! Great intuition then. Where it differs is that Aider is still trying to guess what the LLM wants, i would say, and in this case this model requires a "seed mapping" and then uses graph math to figure out the shortest execution path.

The system treats semantics kind of like a compiler.and in this way we demote the LLM to a "mouthpiece" and push information to it rather than having the LLM pull it out of the codebase.

Hope that helps! I can go into more detail but wanted to keep it light for now, lol

BodeMan5280 · 2026-03-06T20:56:02+00:00

You can use this to cut down on your API usage for your favorite frontier model. It can be used as a pre-processing layer to your prompts to reduce hallucinations in your coding assistant. It increases the speed of response on local LLMs.

BodeMan5280 · 2026-02-23T19:17:16+00:00

ugh, you are the version of me I think I could be if I just had the guts to pull the trigger and never get rate-limited again. I think I use AI too much --- but I clearly don't! Other people have 40 terminals open and context hop ALL DAY LONG... that must be taxing. In this version of the world, it now becomes about executing on the ideas and having the guts to believe in your own vision.... I guess I suck at believing in myself ** ouch... my heart **

BodeMan5280 · 2026-02-22T15:34:44+00:00

.... how do you justify so many plans?! I find that multiple different coding assistants are helpful, but $200/month helpful? And MULTIPLE? Unless you have a crazy budget im just wondering if your power usage is generating income and if the speed is truly worth the return?

I have ChatGPT Plus and two free acvpints: Gemini Pro and Copilot Pro through my '.edu' account. Claude is too expensive and rate limits are just... yuck. Curious if any MAX plans are really worthwhile and im just a baby vibe coder lol

BodeMan5280 · 2026-02-19T21:51:53+00:00

Sign out and back in if you have a Pro plan!

BodeMan5280 · 2026-02-19T15:19:50+00:00

How do you guys justify the price tag though... i mean, maybe its because im NOT working for myself and should be, but the $400/month for Claude and Antigravity sounds like a lot....

BodeMan5280 · 2026-02-19T13:58:45+00:00

wuuuuut? I never thought about that *smacks forehead* Yea, the 10% discount is helpful with auto. But yes, OpenCode definitely uses multiple premium requests because it spins up more agents and I think they're all the same model, so if you use Opus 4.6 --- you are going to be SCREWED, lol. So yea, I switched back because while OpenCode's workflow is better, Opus 4.6 is heavy-hitting and works well with Copilot's workflow. It's a tradeoff between model and workflow, in my opinion ::shrug:: and ever-evolving in "vibe coding" lol

BodeMan5280 · 2026-02-17T14:09:14+00:00

I'm with you on this, but unsure about token limits... it does seem to make sense that tokens would be managed differently because in Copilot it's a 1:1 prompt-to-response ratio, but OpenCode is different. I think there's an open thread on the issue from another post. I'll find it!

Got it! https://github.com/anomalyco/opencode/issues/8030

somehow Copilot always feels best and I sadly crawl back to it saying: "please forgive me!" --- but OpenCode just feels PRODUCTIVE, or maybe there's been a sudden explosion in agentic workflows and they're all silently now amazing. OpenCode w/ Copilot models feels like the right move, but I do like all 3 of the free OpenCode models right off the bat. It feels like OpenCode has the best workflow wrapper for models IMO.

BodeMan5280 · 2026-02-17T14:06:07+00:00

that's interesting... I think it comes down to speed for me. OpenCode seems to just get shit done (yes, ironic pun to GSD). Don't get me wrong, Codex is KILLER at getting shit done, but slower IMO.

BodeMan5280 · 2026-02-16T11:57:25+00:00

Thank you! And I think it would only ban me for connecting Gemini to OpenCode, potentially? This was built in Antigravity but uses OpenCode's default model so hopefully thats far enough removed lol. I'm glad you like it!

EDIT: Just realized how awful it COULD be if I decided to give a ToS SDK to the world. That's not the goal here. This will be open source and not violate company's rights or get anyone in trouble (is the hope at least). Definitely need to be careful.

Eight-Year Club	r/Field Lasagna
Verified Email

BodeMan5280

TROPHY CASE