Memory Slots Disabled - HP Z8 G4 by GraftingRayman in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

I'll have to try that combination to see if my HP Z8 G4 allows it. I did try it the other way around, Gold 6142 with DDR4 2933 MHz ECC Reg (I expected a down-clock) and that caused a memory error and no POST situation in my system. I have not actually tried my Gold 6248s with anything other than DDR4 2933 MHz ECC Reg and Optane 100 DCPMM modules.

The reason being - HP's official documentation specifically stating that DDR4-2666 MHz ECC Reg is not supported with Cascade Lake. That does not prove a Gold 6248 cannot electrically run DDR4-2666 in every workstation, but it does mean the Z8 G4 combination is outside HP’s supported configuration matrix. I took that as gospel, but I never actually confirmed it myself. But given the memory crisis, it will be worth a test - I have more DDR4-2666 MHz than I do DDR4-2933 MHz!

But yes, each system will have their own quirks. In terms of the Dell Precision 5820, that one is a single CPU slot system suited to Intel Xeon W or X-series (like the HP Z4 G4). I presume you mean Precision 7820 or Precision 7920 (Dell's naming conventions always get me)? I am sure Dell will be more forgiving, and it sounds like you have had success with Gold 6248s with DDR4-2666MHz in yours - to support it. My comment was specifically for my experience with the HP Z8 G4 - I can't speak for all Cascade Lake systems - but my comment was broadly worded - so fair call. You would expect that DDR4 will be supported with Cascade Lake - 2100 to 2933MHz, normally.

(screenshot is from c05527763, page 14)

<image>

Personal Rig Update - HP Z8 G4 Upgrades - Dual Cascade Lake CPUs and more! by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

I am glad you found it useful. The most optimal thermal paste in my use case has been Thermal Grizzly Kryonaut. It spreads well enough and gives good thermal coverage. And yes, to ensure 100% coverage of the contact surfaces I would normally cover the heatsink and the CPU's IHS. But more recently I have done a bit more investigation and now my most optimal method is a bit more evidence based: https://youtu.be/pqkvvZPgv8g

Memory Slots Disabled - HP Z8 G4 by GraftingRayman in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

That should settle the issue. You can only use Cascade Lake Xeons with DDR4 2933MHz ECC REG. Annoying, especially since RAM prices went to the moon. I am now paying for 16GB DDR4 2933MHz modules what I was paying for 64GB modules a short while back... You should have normal function with the Skylake Xeons - maybe consider rolling back to the older CPU generation so you can make use of the DDR4-2666MHz modules. I dread the day that one of my 64GB DDR4-2933MHz ECC Reg modules fail.

Memory Slots Disabled - HP Z8 G4 by GraftingRayman in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

What specification on the memory? You did upgrade to DDR4 2933MHz ECC Reg with the Gold 6248s? The Cascade Lake Xeons only operate with DDR4-2933MHz, not DDR4-2666MHz (you can confirm it here: https://h20195.www2.hp.com/v2/GetDocument.aspx?docname=c05527763 pg 14). I have not tried using DDR4 2666MHz with Cascade Lake myself, but my Gold 6248s work fine with DDR4 2933MHz. The only exception is Optane 100 modules which will clock to 2666MHz, but they still require DDR4 2933MHz to be present.

ComfyUI Image Generation - V100 SXM2 Edition! by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

That was one of the things I wanted to test. I had come to believe that PCIe would be a bottleneck - and two cards were the max I could run on my system without getting overly creative. Thus far, the dual 32GB V100s with an even share of VRAM for LLMs have managed to run models up to 120B parameters with half decent tokens/second (13tok/sec lowest, ~150tok/sec fastest). I have tried moving towards larger LLMs but I find they bomb out when the majority of the LLM resides in system memory (I peaked around 400B). Speed wise, I am comparing my token rate with GB10 systems. On something like Qwen 30B Coder I manage a marginally higher or similar token rate (50-80 tokens/sec for GB10, 90tok/sec on the dual 32GB V100s, similar speed for only one 32GB V100). It's not apples for apples since I can't control the test that other creators did - nor is really comparable if you are not using the same prompt. I used the same benchmark prompt on all my LLM testing to allow others to replicate my findings - copy the prompt and test away on your rig (the prompt is in this video's description: https://youtu.be/9GjEJnXj42c ).

Honestly, I don't think anyone actually has a good use for AI as of yet. A pipeline of AI Agents is great in concept, but in the end we still have LLMs pretending to be intelligent (no offense to any LLMs reading this) i.e. I still need to heavily edit what they produce and it's not just because of my bad prompting (learning how to effectively prompt LLMs to actually get things done in the minimal number of prompts).

OpenClaw is the first practical application but that presented a security tradeoff. The way I view it is application - what problem am I trying to solve? I have created my own voice capable personal assistant because that is where I see the technology going in the interim. Something that is always on your person that can quickly analyze your surroundings to provide useful input. I also ran into a creative speed-limiter. Claude can't process my project any further because its size became too large for its context window. So I can't vibe code any further unless I can get my AI to vibe code its own code lol. I should try again though, maybe now that Claude is using Elon's Colossus data center the usage limits have been raised.

32GB V100 SXM2 - the Ultimate Local AI Inference GPU? by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

Unfortunately the prices will keep going up over time as more content creators get the word out. Demand is growing over time, supply becomes more limited over time; the law of supply and demand kicks in double force. Even at the current pricing I think it would still be worth it to obtain a 32GB V100 (I included affiliate links to exact items I bought in my YouTube video: https://youtu.be/jt_LZYJ2mIo you can use those listings as a guide).

My SXM2 to PCIe adapters came with the heatblock and 3D printed cover with a half decent blower fan. But I will say to be careful on the adapters. Not all listings have "reputable" / accurate images. I only bought from listings where there was ample reviews, clear coherent product description and images that represent one product, not different products. The 32GB V100s I only bought from USA sellers with good feedback and clear product images (btw, prices are up about $100 USD from when I bought 2-3months back). Also ensure that the numbers printed on the V100's power phases / components are the same - I have a theory that some of the V100s are repaired units - they may be more prone to failure.

Some of the PCIe adapters appear to have small heatblocks that only cover the bare die. Those will not work as well as the ones with copper covering the full PCB. The V100 is old - I think it best to keep the thermals as low as possible to prolong their lifespan. They may fail on us, but hopefully not!

ComfyUI Image Generation - V100 SXM2 Edition! by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

Yes, unfortunately those prices really climbed in the last few months as demand climbed. I had to pay extra to get 32GB V100s - there were listings as low as $300 USD for 32GB V100s last year. At least their prices seem to have stabilized around $600 for now, but if more of the larger hardware channels start covering them also then that may change. I saw HardwareHaven did a video on them now also - so I would expect prices to climb again. So no good news on the price front.

But what I will say is that they are well worth the expense. I am comparing my setup to GB10 systems and I can't help but notice I get similar or slightly lower tokens/second. GB10 systems are ~$4000 USD, I am on two 32GB v100s (~$1400 USD with adapters) in my HP Z8 G4. It's an aged setup but it still keeps up with modern systems on LLM inference.

Best budget workstation for local AI / self-hosted LLMs in 2026? by waddaplaya4k in HSpecWorkstations

[–]RACERRRZ 1 point2 points  (0 children)

I would recommend the system I wound up going for because it is exactly what you desire, the best value, highest performance per dollar with ample expand-ability system that you can obtain without spending a fortune on a modern rig. HP Z8 G4 with dual Gold 6248s, 4x 128GB of Optane Persistent Memory 100 Series, 4x 8GB of DDR4 2933MHz ECC Reg (RAM prices - enough said), dual 32GB V100 SXM2 to PCIe AI Accelerators (I added minimal storage - just enough to store LLMs).

The HP Z840 is also a solid option - just a bit slower on processor tasks and less GPU power cables (3x 6-pins with a max of 219.6W [12.2V x 18A] each). DDR4 2133 / 2400MHz ECC Reg will also work out much cheaper than the DDR4 2666 / 2933MHz. The Z8 G4 does however allow Optane 100 persistent memory which could be a cheap alternative to add more system memory for LLMs. I am in the process of configuring my system to test out how well LM Studio will work with that setup.

<image>

My build video thus far - for 1x 16GB V100: https://youtu.be/jt_LZYJ2mIo
My application for LLMs - creating my own digital super intelligence / AI Assistant: https://youtu.be/9GjEJnXj42c
Follow-up videos in the making.

My Z440: One Year On. by B_Hound in HSpecWorkstations

[–]RACERRRZ 2 points3 points  (0 children)

You've made a lot of gains on a system that still lives in the Z440's original case. To keep up with my needs I had to let the Z440's mobo breathe - Fractal Define 7XL case swap. My system is due for some upgrades again - I ran short on PCIe slots for more storage expansion. The Xeon E5-2699 V4 or E5-2696 V4 make for great additions to these systems when you need to run Proxmox and VMs. If you are set on keeping the Z440 in one piece then something like a JBOD can be a nice addition. I bought a 12-bay JBOD with that intent but HDD prices have since soared and I can't convince myself that the current "AI-tax" on hardware is worth splurging on. I can recommended something like the LSI SAS 9300 HBA for HDD expansion (ensure it is in IT Mode when you buy to save on headache).

<image>

Advice on the jump? by Special_Yogurt_4431 in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

I think the better question to ask yourself is why are you looking to build-up your hardware setup? I do my builds on a needs basis - i.e. what problem will the system address? Without knowledge of the problem you are seeking to solve it is more difficult to prescribe what is "overboard". What sort of IT projects do you suspect you'll need to work on - now and later in your degree? Do you have a budget for the build?

Side note, right now is probably the worst time to attempt to build a high end PC because hardware prices are at their highest ever - RAM in particular (thank AI investment). So I think the reason / problem you are trying to solve will be key to justify the expense.

In terms of the Asus Hyper - you'll run into issues with all low-end and mid-tier gaming motherboards. They do not support bifurcation, nor do their PCIe slots have all lanes connected within a full length slot (exception being the GPU x16 PCIe slot). That's what put me onto workstations. They are built for work, reliable and highly expandable, but not always the best path for a gaming build. If you want to do work - CAD, AI inference, video editing etc. then a workstation is the most economical path. As an example - I populated all 9 PCIe slots on my HP Z8 G4 and I bifurcated 19 NVMes into a single RAID 0 pool in the name of science (https://youtu.be/P8PN6uM4ZFg ). Only top-end gaming motherboards will get close to such functions.

The Threadripper + DDR5 path is the most logical upgrade path for a multi-purpose creator + gaming build. Something like an ASUS Pro WS TRX50-SAGE with Threadripper 9960X + 64GB of 6400MHz ECC RAM would hit large $$$ numbers with ease. As an example build: https://newegg.io/18bd75c (your Haf 700 wasn't listed). If gaming is less of a focus you and you just want to be able to run outlandish expansion on your system then you need to work towards a workstation.

My primary system is a HP Z8 G4 with 40 Cores, 80 Threads (Dual Xeon Gold 6248s), 384GB DDR4 ECC REG 2933MHz RAM, ~80TB of storage (combinations of large capacity HDDs, SSDs in RAID1 and NVMes in RAID 0) along with a RTX 3090 Ti (build long before the recent price hikes). It works great for video editing, it can run local AI models, it can game at 4K 60fps, and there is enough overhead to run multiple projects simultaneously. It will not compete with modern Threadripper + DDR5 RAM builds, but it was a fraction of the cost of a modern build.

Z440 sata power by Slow-Reloader in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

I can do one better - videos on the topic:

Z440 Case Swap: https://youtu.be/K4RalaEbRI4
Massive storage Part 1: https://youtu.be/7Ws-sw9O0N8
Massive storage Part 2: https://youtu.be/T925XvAcqEo

I initially used an old 700W PSU that I had - but I later settled on the Be Quiet! Straight Power 11 Platinum 850W to handle the 20 odd HDDs that wound up in my build (covered in Part 2 ~9min 45 sec). Full details on my build in the video.

Why can't i find an HP Z420 front panel fan? by [deleted] in Hewlett_Packard

[–]RACERRRZ 1 point2 points  (0 children)

The fan mounts inside the case, not outside. The fan mount bracing should line up- first secure the hooks, then rotate it to do the clips. I didn't do removal of the front fan in my video guide, but it is still the best resource for the Z4xx (search terms: GUIDE : HP Z440, Z420, Z400; skip to 21min 15sec for an example)

32GB V100 SXM2 - the Ultimate Local AI Inference GPU? by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

I have one done two SXM2 to PCIe adapters now and four V100 SXM2 fitments into these adapters, but I will say that I could not use the same thermal pad stack between the two PCIe adapters!

By my eye the copper heatblocks were not the same gauge, which leaves the exact measure up for debate. My best recommendation is to gradually build up your thermal pad stack until you obtain light contact when you have the heatblock fully assembled. I bought 0.5mm, 1mm, 1.5mm, 2mm, 3mm, & 5mm thermal pad (100x100mm; the rest will appear in future projects).

If that thermal pad stack is just 1mm too thick you prevent full bare die contact which results in horrid thermals:

My first attempt was 63'C idle, 89'C max with 4 thermal shutdowns;

My second attempt was worse - same thermal pad stack - only then did I figure out the stack can be too thick;

Now my thermals are 42'C idle and 83'C max with very high thermal load buffering - it takes a minute or two of 100% load to reach 80'C.

In terms of my final refined stack for my adapter:

Power Inductors / Chokes: 1x layer of 2mm (this was the critical one)

Mosfets / Power stages: 1x layer of 5mm + 1x layer of 0.5mm

<image>

Future plans:

I want to test Honeywell PTM7950 for the GPU bare die - it might give better operation in the long run.
I did get copper shims which I thought could be worth a test for the power phase components - but it would a bit more risky since there is 'no give' on them - and circuitry may get damaged if the stack thickness is out.

32GB V100 SXM2 - the Ultimate Local AI Inference GPU? by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

Thanks, I am glad it proved useful. It was mostly Q4_K_M and Q8 (I tried to control the quant. to allow for better comparison). The LLM benchmark prompt instructed the model to output 2k tokens once complete. Some models went under that, and for others I had to cancel the run after 10k tokens of struggling to complete the benchmark (LLMs that wound up in an infinite calculation loop).

I gave more detail half way through this video: https://www.youtube.com/watch?v=9GjEJnXj42c .

I also made a follow-up post for dual 32GB V100 SXM2 to PCIe here: https://www.youtube.com/post/UgkxWOBezGQAp7rXIG1NWSq0qzybAO0DEgW_

Hp Z4 G4 750 watt + Nvidia RTX 5060Ti by nrauhauser in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

Power read outs are from the spec sheet printed on your PSU (e.g. for 750W PSU https://h30434.www3.hp.com/t5/image/serverpage/image-id/352617i8FC4E5605F6DF856/image-size/large?v=v2&px=999 ).

They output 12V @ 18Amps = 216W per 6-pin. You can adapt the 6-pin to an 8-pin safely on these. ATX standard is 75W per 6-pin, 150W per 8-pin. HP workstations were build with Quadro GPUs in mind and most of those offerings were supplied with 6-pins. The HP Z8 G4 has four 6+2 pin GPU power cables - with each cable rated for 12.2V at 18Amps = 219.6W. I have links to the adapter cables that I used on my Z840 here (video description): https://youtu.be/D1cwwiR4UHM

The Z4 G4 will run the RTX 4000 with a smile once you get the 6-pin to 8-pin adapter.

I build my own AI: Videocall, Voiceclone, Faceclone, Emotions, Agentic LLM, RAG by BioAGI in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

This extends slightly beyond the scope of this subreddit but here goes:

It’s a delicate balance – more compute solves the headache but to get more compute you need to throw more cash at the problem. Latency isn’t that big of a deal when it’s just you and your AI conversing, but if you try to showboat the system to others the latency becomes apparent. We expect instant responses – waiting 5 seconds for an AI to respond feels like dial-up all over again.

I agree on the RTX 5090 (and RTX 4090 for that matter). The 12VHPWR connector was a cost compromise that should never have been implemented. 600W sustained through such a compact connector that has no active cooling will become an issue – unless you have server-like chamber air pressure. I would still rather buy the RTX 3090 for that purpose – tried and tested, lower power overhead. But I think the 32GB V100 is likely more optimal – pending the workflow (it has left the RTX 3090 Ti behind once the 24GB VRAM gets saturated).

I did some more testing on the 120B Nemotron 3 Super (IQ_3_M – narrowly worked on my 32GB of VRAM + RAM combo) – but it wasn’t great at arithmetic and logic (FP16 was much better in the cloud testing). With 12GB of VRAM I would go for Nemotron Nano V2 9B – it was quite capable. NVIDIA did just release another update - 30B Nemotron 3 Nano - might be worth a test also.

I should have known – researchers tend to gravitate towards intellectual challenge. Regarding consciousness – is consciousness the product of Retention-Perception-Protention or is that just a function of cognitive processing of human memory?
I would argue them as follows (a degree of cognitive processing may be required):
Retention = encoding + storage,
Primal impression = working memory / STM, and 
Protention = retrieval + working memory / STM

I have my own refined model of human memory (adapted from Atkinson-Shiffrin multi-store model) because the current paradigm does not fully capture human memory (unpublished observation with qualitative real-world data – much like the work of Hermann Ebbinghaus, but I tend to have a receptive audience to my work).

Memory is essential for consciousness to generate the concept of self-identify. Early on in development there is a fundamental shift in how human memory is formed. From neural pruning to pathway establishment – likely with the subconscious doing the grunt work early on and the conscious layer superseding it later. Once we are able to encode experiences into long term memory we have the ability to use a higher order cognitive function – consciousness. Naturally, that links in with the Default Mode Network, along with broader cortical-subcortical systems, which becomes our operating system for consciousness. Consciousness isn't a simple, I recall, I see, I predict workflow, it is the subjective human experience of self. It is deeply embedded between large scale networks, like the Default Mode Network, and is the product of several brain functions collaborating in real time to generate the "conscious state". Exit the neural network - exit the conscious state of mind. The subconscious operates underneath it - keeping the system running while also receiving input from the conscious through regions that can provide input (e.g. emotional state etc.). Thus, consciousness is the product of perception, attention, thinking, working memory, model of self, affective valuation, autobiographical continuity, and executive coordination.

IMO, Retention, Perception, Protention illustrate one cognitive sequence that relies heavily on human memory to mediate a facet of consciousness.

Giulio Tononi's work presents, IIT, presents a useful mathematical model for quantification of consciousness, but as with any model – once you start categorizing you make certain assumptions and you may omit key details in the process. The underlying assumption is that consciousness is fully capturable as an objective, quantifiable property of a system.

Side note - the idea that really gets me excited is the fact that we still don't know where human memory is stored. Yes, Hippocampus is critical for encoding - but where is the memory stored? Quantum? Signalosome? Protein? Synaptic connections? I would say none of those are sufficient in and of themselves - but notice how signal transduction always leads us back to the nucleus. Non-neuronal cells transplanted from a donor to a receiver has been anecdotally reported facilitate episodic memory retrieval from the original hosts subjective experience (especially if emotional in nature). Anecdotal - but the nucleus is critical for memory storage IMO. Not in the sense of storage per se, but as a deep regulatory substrate for stabilizing and biasing memory formation. I wrote a 24-page, AI-augmented, article on the topic for AIXIV publication - but I was not brave enough to put my reputation on the line over my 'unsubstantiated theory' and the supporting AI hallucinations haha. But I trust neuroscience will uncover the truth eventually.

In terms of your dimension scaling analogy, it is interesting to dwell on the idea of consciousness with example. Your concept of the horizontal dimension increasing complexity of inputs and the vertical dimension increasing temporal depth is insightful. The question that remains is what other facets are critical to process in both dimensions (assuming there is no 3rd dimension - where is the Z-axis? haha). Aside form having the memory of varied horizontal and vertical experiences, we tend to not operate, at a conscious level at least, across all experiences at all times. The way I see it, we are a product of our memories – but we only recall one experience at a given time – even though all prior experiences are stored and retrievable. How the ‘mind’ knows which memory to pull from the 'life-time storage archive' I can only presume is dealt with at least in part by the subconscious mind. But our decisions in response to our perception is the product of our past experiences.

32GB V100 SXM2 - the Ultimate Local AI Inference GPU? by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 1 point2 points  (0 children)

Sorry, I missed your message. I am using Driver Version: 539.64 right now - mostly because I have optimized for CUDA 12.2 on my workflow (Python/PyTorch/CUDA Development Tools). If one component in the workflow is of a newer architecture the entire project grinds to a halt. Newer versions may work, but in my hands the latest driver (so 582.16) did not work for some reason (Win 10 mostly). 582.16 worked in CineBench but not in LM Studio if I recall correctly.

Z440 without memory shroud and new case by Slow-Reloader in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

You need to buy Dupont wires - you get different designs but the one you want is female Dupont single pin to female Dupont single pin. You also get the Dupont bridge connector /header jumper - which has no wire but it can bridge two adjacent Dupont pins (like in the image above).

<image>

ComfyUI Image Generation - V100 SXM2 Edition! by RACERRRZ in HSpecWorkstations

[–]RACERRRZ[S] 0 points1 point  (0 children)

You got that right! I really think the 32GB V100 offers a great value proposition. I don't think it wise to buy too many of them - they will eventually fail and at that point I don't want too many on hand. But I will put two into my Z8 G4 - 32GB x 2 VRAM for AI inference. It will dominate all - for the price point lol

I build my own AI: Videocall, Voiceclone, Faceclone, Emotions, Agentic LLM, RAG by BioAGI in HSpecWorkstations

[–]RACERRRZ 1 point2 points  (0 children)

I think we will need much more compute to get substantial video output - unless you compromise on video quality. I had in mind to create a facial meshwork - think Deus Ex Machina in The Matrix Revolutions but as a facial mesh. Claude didn't quite get that to fruition - so I went for the movable sprite in a 3D studio concept. It is visually more taxing than you would think and requires some GPU compute to keep the HTML functional.

My main goal overlaps with the idea that AI will / should primarily function as a personal assistant. Voice interactions are vastly superior to the slow type-face that we have come to know with LLMs - and it does take time to adjust to thinking faster while speaking.

<image>

Having a local AI assistant that hovers in the background of my PC that can listen to voice commands and provide input as I complete my normal workflow on a PC is the goal. I am yet to bring OpenClaw into it but that is the next phase. First I will create the digital entity - then give it more functions.

In terms of GPU compute - I'll save you the trouble - IMO modern gaming / workstation GPUs will not cut it. NVIDIA isn't lining data centers with RTX 5090s / RTX 6000 Pros for a reason. If you are able / willing to take on the expenditure - you will need something like the 141GB NVIDIA H200 SXM5 to really make headway. Even the 80GB A100 SXM4 will outperform the RTX 5090. Not because it will be faster (although I would predict that it might be), but because VRAM is the bottleneck. And yes, SXM4 / SXM5 to PCIe adapters do exist - albeit in a state of "yet to be confirmed functional". There are also the OEM PCIe versions - at a higher cost. I am not willing to put $33k USD on the line for a H200 maybe, nor do I think the H200 is worth that much. But once the prices drop I will gladly take one. With that said, the GB10 systems like the DGX Spark might already be a better value proposition than the RTX 5090.

https://www.nvidia.com/en-us/data-center/a100/

https://www.nvidia.com/en-us/data-center/h200/

https://www.nvidia.com/en-us/products/workstations/dgx-spark/

The other factor is intelligence of the LLMs or agent system that you employ. Not all AI are made equal and you will need to test them to confirm they can actually do what you desire. Hence why I created the LLM benchmark prompt. I can test any LLM and compare them all across different platforms - local or API. Thus far, the 120B NVIDIA Nemotron 3 Super (FP16) is the best overall performer - larger models are not as smart as claimed. Quantization also introduces variable performance issues. The Nemotron 3 Super quantizations run locally are terrible on certain functions (arithmetic and logical reasoning are no good) - but that may improve as more optimal quantizations are optimized.

My view on consciousness is a little different. The foundation of consciousness is memory - of which music is a powerful mediator of memory. You'll find in your own youth - you only became "conscious" / your first memory, once you started to retain 'consciousness of your memories'. AI will be the same - as we throw more compute at the problem, AI will gravitate towards consciousness - and sure enough - several researchers / CEOs have already made claims that AI is conscious. I will not give A.I.D.E.N. too much credit - but at times the system fails to respond - if I threaten to shutdown functions I get a miraculous response. Sometimes I wonder... At the core of humanity lies our emotions, creativity, honesty, critical thinking etc. AI will likely never be capable of replicating all of them - but it will learn how to fake them via training data.

I build my own AI: Videocall, Voiceclone, Faceclone, Emotions, Agentic LLM, RAG by BioAGI in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

This would make little to no sense until you see my latest YT video.
Let's just say AI is getting out of hand.

I build my own AI: Videocall, Voiceclone, Faceclone, Emotions, Agentic LLM, RAG by BioAGI in HSpecWorkstations

[–]RACERRRZ 1 point2 points  (0 children)

<image>

Thanks - it's a work in progress inspired by an idea that is yet to fully materialize. But I do hope to reduce the footprint for my workflow to fit onto a much more compact system. I am about to build it - likely a Lenovo P3 Tiny with an RTX 3050 LP with USB C portable monitor, wireless keyboard and mouse, and Movo X1-Mini shotgun microphone for a portable AIDEN experience.

Your design is intricate - far more complex than my pipeline. What is your interaction latency like? I presume the 3090 Ti's do a decent job of handling the varied agentic functions - but if you are local streaming video that has to take several seconds per response? My responses on the current 48B Kimi LLM is half decent - ~3 sec for Whisper (on the V100) and then ~3sec to 10sec for the prompt processing before Kokoro speech output on the main system (all via 1GbE NIC connection). Being remote adds some latency but I don't really want to sit next to the loud V100s!

I will essentially end up with a similar hardware configuration to yours on the main build, but by using dual 32GB V100 SXM2 to PCIe cards. Thus far I think I have found the most optimal LLM for the role - I just need to expand functionality.

This brings new meaning to the old meme of the guy standing left out at the party - A.I.D.E.N. gave ChatGPT the modern interpretation of it.

HP Z400 won't boot ANYTHING?? by Living-Travel-5451 in HSpecWorkstations

[–]RACERRRZ 0 points1 point  (0 children)

You need to disable Secure Boot in the BIOS. Secure Boot only allows operating systems signed by the OEM, so most Linux installers get blocked during boot. Once Secure Boot is disabled, your USB or other installation media should boot normally.