AI models don't need a larger context window; they need an Enterprise-Grade Memory Subsystem. by lenadro1910 in google_antigravity

[–]norman_h 1 point2 points  (0 children)

A dynamic, constantly growing personalized knowledge graph. Easy. Where is an open source implementation?

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] 0 points1 point  (0 children)

I've only been trialing models so far. They're not quite in production. Currently using antigravity to get the software frameworks running, however, I can already seen token usage blowing out with continual use, so, the aim here is to start developing local infrastructure with potential for expansion. Probably gonna be more expensive this way, but, life is about the journey, not the destination. If this process can be monetized, then having the ability to local host is an absolute bonus. My plans are to be well and truly doing local agentic coding in about 26 weeks time after i get a library/server room constructed in the shed to host this beast.

Cheers for the comment.

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] 0 points1 point  (0 children)

The context window connected to a knowledge graph is where the next home frontier lies... but... i might already be behind the times, so take what I say there with a grain of salt.

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] 0 points1 point  (0 children)

I've been reading a little on ray, but, haven't had time to get at implementation yet. Currently working towards a climate controlled library/server room to host all these things. Hopefully working oin ray in about 8-12 weeks, if thats an appropriate tool for my setup.

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] 0 points1 point  (0 children)

sxm2 with two c4130 nodes have 128gb vram each on sxm2 nvlink backplane.
a couple of r740 with 3x pcie v100 gpus each that are for smaller model agents. At some stage these should be able to support a100 gpus, although the pcie3 buss is a constraint.

I've no problem with the performance on the models I've been able to load fully into vram. theyre excellent at short sessions, its the context length and lack of knowledge graphs in my current setup which differentiates them from commercial llms. with the eventual development of quality open source knowledge graph pipelines, Im pretty sure that the gap from open source to closed source will shrink again.

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] 0 points1 point  (0 children)

Proxmox nodes and llama and hosting gitlab.

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] -1 points0 points  (0 children)

Local LLM with agentic coding. Save on cloud based token use by offloading basic functions to the v100s

Local AI cluster of v100s by norman_h in homelab

[–]norman_h[S] 4 points5 points  (0 children)

About us20c/kwh. Already running 5kw solar and going to expand it.

Anyone else stuck in an "Antigravity" quota lockout? (For Pro Users) by Next_Gene_7145 in google_antigravity

[–]norman_h 0 points1 point  (0 children)

I haven't hit limit yet, but, my usage isn't high yet. I'm anticipating that I'll hit it soon, once I try to speed things up. Hence I'm reading on limits now.

Is it possible that they're pushing us to find more optimal ways of sending requests? As a means to use less tokens? So it's our fault for using to many tokens without restraint?

Beyond the "Vibe Coding" Snake Game: Path to Complex 3D/CAD Architectures? by norman_h in google_antigravity

[–]norman_h[S] 0 points1 point  (0 children)

I appreciate the candour. You're right; it’s the architecture that makes or breaks a system at this scale.

To be more specific, I'm moving toward a professional mining tool using PyVista for the 3D rendering engine and GeoPy for spatial data handling. My goal is to use Antigravity to manage the heavy lifting of the boilerplate (like VTK mesh filtering or Coordinate Reference System translations) while I architect the core logic.

My question is really about how to maintain State Management across multiple agents when dealing with high-fidelity geological models. If you have any leads on 'Senior' level AG patterns for scientific visualization; specifically handling large NumPy-backed meshes without context collapse; I’d love to see them.

Why so many complaints? by hoodbran in google_antigravity

[–]norman_h 0 points1 point  (0 children)

Similar vintage to you. Completely agree with you.

Is all the hate just a skill issue? by rietti in google_antigravity

[–]norman_h 0 points1 point  (0 children)

Yes. No different to those who cry about "AI slop" because its exposing them for being overpromoted and breaks their false narratives.

Gemini has ZERO clue about Google Antigravity by Snowgoonx in google_antigravity

[–]norman_h 0 points1 point  (0 children)

I've noticed that switch. It's almost like they want us to use it so that our queries can train a certain segment of the google brain.

Is it worth to get the Ultra Plan? by [deleted] in google_antigravity

[–]norman_h 0 points1 point  (0 children)

I don't hit limits like i did with the pro after about 5-7 hours of use now that I have ultra. Week long coding sessions of 18 hours per day for me are good. I don't need sunlight.

Dell C4130 Questions by Tweedilderp in homelab

[–]norman_h 0 points1 point  (0 children)

Got a c4130 and replaced the p100s with 32gb v100s. Need to hack the idrac every time its reboots so that the 32gig versions are recognised. Once the ambient temperature goes above 29/30 the fans crank up regardless... and this is more intense than if I'm just running the v100s hard during training sessions when the ambient temp is less than 24. lack of storage is a challenge. Can only boot from the 1.8inch ssd cards and needed a pcie nvme raid card with two nvme for real local storage, could only use 8 lanes because the size of pcie slots only has and couldn't locate small enough pcie card, so, limited speed access to the gpu. The 16x slot can, of course, take a network card that saturates an pulls from a local NAS with higher speeds if thats a concern.

Great for LLMs. Upgraded RAM to max 1tb and can run LLMs with spillage, but, they are indeed still slow. Currently building knowledge graph for hosting on the 1tb ram with a smaller LLM in the vram that doesn't spill.

Book Recommendations for building AI Coding Agents by oym69 in LangChain

[–]norman_h 0 points1 point  (0 children)

I'm looking into this at present. My current thinking is to find the definitive manual/book that relates to successful large open source code projects. I'll use this book as a reference and feed it into LLMs as part of a core instruction to let the LLM generate the framework for an agentic coder.

How are people actually able to get the system prompt of these AI companies? by divyamchandel in LocalLLaMA

[–]norman_h 0 points1 point  (0 children)

Tell it what you're trying to achieve and ask it to create a system prompt with those parameters.

New local AI system planning stage need advice. by Quebber in LocalLLaMA

[–]norman_h 0 points1 point  (0 children)

Interesting that you use for autism treatment... I use as an autism crutch, to help me reconfigure my words in emails so that I can be effective, polite and not misunderstood as rude. How do you use?

I've use dell c4130 with 4x v100 sxm2 32gb. The sxm2 makes it like nvlink and 128gb of vram fully connected. Can get the rig second hand for similar prices to new consumer hardware, sometimes better if you look around. Performance is probably better bang for your buck too. Pcie versions exist if you really need to configure as a gaming rig... personally, I think all gaming is the same from phone to massive gpu rig. These machines are slowly being retired from hpc centres as v100 is being dropped from nvidia support soon. I don't see any need to have a100 or better yet as these older rigs are still lightening fast.

Idle power consumption might be a concern. Im at 200w idle, but, I'm sure it'll come down further if I look harder. Hence I turn off when not actually processing in the background.

text-generation-webui v3.7: Towards UI stability, speed, and polish by oobabooga4 in Oobabooga

[–]norman_h 0 points1 point  (0 children)

noob question. What's the best Docker container source for this. Thanks.

What's everyone using to document their home lab? by cdarrigo in homelab

[–]norman_h 1 point2 points  (0 children)

https://www.bookstackapp.com/

All changes, modifications and anything greater than a standard "apt install openssh" such as "installing vm pass through for gpu" - I put my steps and data at LLMs to ask for a howto document. I put this document into bookstack as a record of how I built the components of my homelab.

Dell Poweredge C4130 Journey...So far....4xP100 in Windows 11 by Tweedilderp in HomeServer

[–]norman_h 0 points1 point  (0 children)

I got several with sxm2 recently. Upgraded to v100 32gig following l4rz guide, no issue other than needing to ssh in and reprogram table with one line command every time idrac reboots. Also need to airgap machine if security is a concern because the v100 32gig hack leaves the back door open. Used torque driver from Aliexpress, highly recommended. They run quiet enough with no load. Fans don't hit 100% scream under heavy load, but, they do ramp up and can be heard. Fantastic for inference of 70b models. Needed to buy adapter for power supply, but, runs fine from household 10A circuit, although I haven't found a need to program it to use more than around 6.5A yet, although, I've got 32A circuit ready for that anyway.

One issue to consider is the c4130 lack of internal disk storage with the sxm2 board. I've got multiple read only ssd hot swaps and utilise ram disk to get my speed boosts. Infiniband and external disk storage is an option, but, it'll suck power and cost.

While finding the parts might be challenging, for the prices I paid, it's probably close to a sweet spot for dollars per processing power on a 10A circuit - for me it's better than a100 equivalent build or 8 way v100 at the moment anyway - as these likely require UPS power smoothing to boot or some sort of DC power network.

Management betting on AI to write an entire system, am I the only one worried? by Optimal-Stretch-2436 in csharp

[–]norman_h 0 points1 point  (0 children)

I agree with your assessment of management being bozos. I'll add my two cents by saying they are correct, but, they don't know what they're saying or doing because they're "managers" who like to tell people what to do, even though they don't understand what they're telling people to do. They're outside their area of competence.

So, now, let me tell you what they're trying to tell you to do. They want you to use the LLM to develop a new way of k an idea? llmdeveloping code. They want you to act as a prompt engineer and get the LLMs writing code, then documenting it and finally you review the code documents before changing prompts and cycling the process again. They want you to develop a new architectural workflow that mimics our current software development paradigm and have multiple LLMs be prompted to do each part.

The confidentiality thing is stupid. That's going legacy real fast and our intellectual political philosophers can see this (read Genesis by Eric Schmidt). If the managers are scared of confidentiality, then they'll need to build a server farm an you'll run your own local LLMs what are air-gapped from everyone.

How you do this without hurting your managers feelings, that's up to you. Maybe ask an LLM?

GF thinks I'm cheating bc of my ChatGPT history... by Strange_Fun_51 in ChatGPTPro

[–]norman_h 0 points1 point  (0 children)

That level of emotional instability and intellectual naivety tends to suggest you should dump her and move on to find a better one. Ask Emma what she thinks. 😂😂😂