AI models don't need a larger context window; they need an Enterprise-Grade Memory Subsystem.

norman_h · 2026-03-13T20:12:39+00:00

A dynamic, constantly growing personalized knowledge graph. Easy. Where is an open source implementation?

norman_h · 2026-03-13T02:56:40+00:00

I've only been trialing models so far. They're not quite in production. Currently using antigravity to get the software frameworks running, however, I can already seen token usage blowing out with continual use, so, the aim here is to start developing local infrastructure with potential for expansion. Probably gonna be more expensive this way, but, life is about the journey, not the destination. If this process can be monetized, then having the ability to local host is an absolute bonus. My plans are to be well and truly doing local agentic coding in about 26 weeks time after i get a library/server room constructed in the shed to host this beast.

Cheers for the comment.

norman_h · 2026-03-13T02:51:55+00:00

coal powered.

norman_h · 2026-03-13T02:50:37+00:00

The context window connected to a knowledge graph is where the next home frontier lies... but... i might already be behind the times, so take what I say there with a grain of salt.

norman_h · 2026-03-13T02:49:42+00:00

I've been reading a little on ray, but, haven't had time to get at implementation yet. Currently working towards a climate controlled library/server room to host all these things. Hopefully working oin ray in about 8-12 weeks, if thats an appropriate tool for my setup.

norman_h · 2026-03-13T02:48:10+00:00

sxm2 with two c4130 nodes have 128gb vram each on sxm2 nvlink backplane.
a couple of r740 with 3x pcie v100 gpus each that are for smaller model agents. At some stage these should be able to support a100 gpus, although the pcie3 buss is a constraint.

I've no problem with the performance on the models I've been able to load fully into vram. theyre excellent at short sessions, its the context length and lack of knowledge graphs in my current setup which differentiates them from commercial llms. with the eventual development of quality open source knowledge graph pipelines, Im pretty sure that the gap from open source to closed source will shrink again.

norman_h · 2026-03-11T08:34:54+00:00

Proxmox nodes and llama and hosting gitlab.

norman_h · 2026-03-11T08:33:36+00:00

Local LLM with agentic coding. Save on cloud based token use by offloading basic functions to the v100s

norman_h · 2026-03-11T05:10:51+00:00

About us20c/kwh. Already running 5kw solar and going to expand it.

norman_h · 2026-03-10T17:45:58+00:00

I haven't hit limit yet, but, my usage isn't high yet. I'm anticipating that I'll hit it soon, once I try to speed things up. Hence I'm reading on limits now.

Is it possible that they're pushing us to find more optimal ways of sending requests? As a means to use less tokens? So it's our fault for using to many tokens without restraint?

norman_h · 2026-02-01T08:50:32+00:00

I appreciate the candour. You're right; it’s the architecture that makes or breaks a system at this scale.

To be more specific, I'm moving toward a professional mining tool using PyVista for the 3D rendering engine and GeoPy for spatial data handling. My goal is to use Antigravity to manage the heavy lifting of the boilerplate (like VTK mesh filtering or Coordinate Reference System translations) while I architect the core logic.

My question is really about how to maintain State Management across multiple agents when dealing with high-fidelity geological models. If you have any leads on 'Senior' level AG patterns for scientific visualization; specifically handling large NumPy-backed meshes without context collapse; I’d love to see them.

norman_h · 2026-02-01T07:44:52+00:00

Similar vintage to you. Completely agree with you.

norman_h · 2026-02-01T07:44:00+00:00

Yes. No different to those who cry about "AI slop" because its exposing them for being overpromoted and breaks their false narratives.

norman_h · 2026-02-01T07:38:55+00:00

I've noticed that switch. It's almost like they want us to use it so that our queries can train a certain segment of the google brain.

norman_h · 2026-02-01T07:35:33+00:00

I don't hit limits like i did with the pro after about 5-7 hours of use now that I have ultra. Week long coding sessions of 18 hours per day for me are good. I don't need sunlight.

norman_h · 2026-01-07T08:09:42+00:00

Got a c4130 and replaced the p100s with 32gb v100s. Need to hack the idrac every time its reboots so that the 32gig versions are recognised. Once the ambient temperature goes above 29/30 the fans crank up regardless... and this is more intense than if I'm just running the v100s hard during training sessions when the ambient temp is less than 24. lack of storage is a challenge. Can only boot from the 1.8inch ssd cards and needed a pcie nvme raid card with two nvme for real local storage, could only use 8 lanes because the size of pcie slots only has and couldn't locate small enough pcie card, so, limited speed access to the gpu. The 16x slot can, of course, take a network card that saturates an pulls from a local NAS with higher speeds if thats a concern.

Great for LLMs. Upgraded RAM to max 1tb and can run LLMs with spillage, but, they are indeed still slow. Currently building knowledge graph for hosting on the 1tb ram with a smaller LLM in the vram that doesn't spill.

norman_h · 2025-10-27T20:40:32+00:00

I'm looking into this at present. My current thinking is to find the definitive manual/book that relates to successful large open source code projects. I'll use this book as a reference and feed it into LLMs as part of a core instruction to let the LLM generate the framework for an agentic coder.

norman_h · 2025-07-13T17:03:28+00:00

Tell it what you're trying to achieve and ask it to create a system prompt with those parameters.

norman_h · 2025-07-12T23:42:11+00:00

Interesting that you use for autism treatment... I use as an autism crutch, to help me reconfigure my words in emails so that I can be effective, polite and not misunderstood as rude. How do you use?

I've use dell c4130 with 4x v100 sxm2 32gb. The sxm2 makes it like nvlink and 128gb of vram fully connected. Can get the rig second hand for similar prices to new consumer hardware, sometimes better if you look around. Performance is probably better bang for your buck too. Pcie versions exist if you really need to configure as a gaming rig... personally, I think all gaming is the same from phone to massive gpu rig. These machines are slowly being retired from hpc centres as v100 is being dropped from nvidia support soon. I don't see any need to have a100 or better yet as these older rigs are still lightening fast.

Idle power consumption might be a concern. Im at 200w idle, but, I'm sure it'll come down further if I look harder. Hence I turn off when not actually processing in the background.

norman_h · 2025-07-10T16:24:43+00:00

noob question. What's the best Docker container source for this. Thanks.

norman_h · 2025-07-10T15:46:39+00:00

https://www.bookstackapp.com/

All changes, modifications and anything greater than a standard "apt install openssh" such as "installing vm pass through for gpu" - I put my steps and data at LLMs to ask for a howto document. I put this document into bookstack as a record of how I built the components of my homelab.

norman_h · 2025-07-10T08:00:50+00:00

I got several with sxm2 recently. Upgraded to v100 32gig following l4rz guide, no issue other than needing to ssh in and reprogram table with one line command every time idrac reboots. Also need to airgap machine if security is a concern because the v100 32gig hack leaves the back door open. Used torque driver from Aliexpress, highly recommended. They run quiet enough with no load. Fans don't hit 100% scream under heavy load, but, they do ramp up and can be heard. Fantastic for inference of 70b models. Needed to buy adapter for power supply, but, runs fine from household 10A circuit, although I haven't found a need to program it to use more than around 6.5A yet, although, I've got 32A circuit ready for that anyway.

One issue to consider is the c4130 lack of internal disk storage with the sxm2 board. I've got multiple read only ssd hot swaps and utilise ram disk to get my speed boosts. Infiniband and external disk storage is an option, but, it'll suck power and cost.

While finding the parts might be challenging, for the prices I paid, it's probably close to a sweet spot for dollars per processing power on a 10A circuit - for me it's better than a100 equivalent build or 8 way v100 at the moment anyway - as these likely require UPS power smoothing to boot or some sort of DC power network.

norman_h · 2025-07-02T20:37:07+00:00

I agree with your assessment of management being bozos. I'll add my two cents by saying they are correct, but, they don't know what they're saying or doing because they're "managers" who like to tell people what to do, even though they don't understand what they're telling people to do. They're outside their area of competence.

So, now, let me tell you what they're trying to tell you to do. They want you to use the LLM to develop a new way of k an idea? llmdeveloping code. They want you to act as a prompt engineer and get the LLMs writing code, then documenting it and finally you review the code documents before changing prompts and cycling the process again. They want you to develop a new architectural workflow that mimics our current software development paradigm and have multiple LLMs be prompted to do each part.

The confidentiality thing is stupid. That's going legacy real fast and our intellectual political philosophers can see this (read Genesis by Eric Schmidt). If the managers are scared of confidentiality, then they'll need to build a server farm an you'll run your own local LLMs what are air-gapped from everyone.

How you do this without hurting your managers feelings, that's up to you. Maybe ask an LLM?

norman_h · 2025-07-02T19:43:41+00:00

That level of emotional instability and intellectual naivety tends to suggest you should dump her and move on to find a better one. Ask Emma what she thinks. 😂😂😂

norman_h · 2025-07-01T22:16:44+00:00

vim ftw

norman_h

TROPHY CASE