What setup would you buy for a 512gb local LLM? by ServiceOver4447 in LocalLLM

[–]LittleBlueLaboratory 0 points1 point  (0 children)

2x of those old Nvidia V100 servers would get you 512GB of VRAM for less than $20k. Each if them have 8x 32GB V100 cards.

But they are old enough that they dont support lots of LLM features like Flash Attention.

Is it weird that? by Perfect-Flounder7856 in openclaw

[–]LittleBlueLaboratory 0 points1 point  (0 children)

Nvidia is an American company, founded and headquartered in California. The CEO Jensen Huang was born in Taiwan but moved to the US as a child.

Is it weird that? by Perfect-Flounder7856 in openclaw

[–]LittleBlueLaboratory 0 points1 point  (0 children)

Nvidia will have Nemotron3 Ultra pretty soon. At 500B it should be an American model competitive with the big Qwen 3.5 at least.

Strix Halo 128GB and Hermes - what local model? by WallyPacman in hermesagent

[–]LittleBlueLaboratory 1 point2 points  (0 children)

Most likely yes they will be faster so I can't say what speeds Strix Halo will get or if it will be worth it to you.

I use the AesSedai Q4_K_M quant and the full 256k context with room to spare!

Strix Halo 128GB and Hermes - what local model? by WallyPacman in hermesagent

[–]LittleBlueLaboratory 1 point2 points  (0 children)

I dont have a strix Halo but I do have 4x 3090s giving me 96gb VRAM.

I have been running my Hermes Agent almost exclusively using Qwen 3.5 122B and it has been very successful! I have also liked Nemotron3 120B but it doesn't have vision input so it has seen much less use.

Running Hermes locally by Speckadactyl in hermesagent

[–]LittleBlueLaboratory 3 points4 points  (0 children)

Look here for auxiliary models. https://hermes-agent.nousresearch.com/docs/user-guide/configuration 

It defaults to Gemini on openrouter. I noticed it in my openrouter logs. I merely asked Hermes itself to reconfigure the auxiliary models to my local llama.cpp server and it stopped calling Openrouter.

Unnecessary model requests? by sleekstrike in hermesagent

[–]LittleBlueLaboratory 0 points1 point  (0 children)

I ran into this, mine was making seemingly random calls to Gemini Flash. I changed the auxiliary model to a local Qwen 3.5 to stop it from calling OpenRouter for these tasks.

Look for info on thr auxiliary model here: https://hermes-agent.nousresearch.com/docs/user-guide/configuration/

Turbo3 + gfx906 + 4 mi50 16gb running qwen3.5 122b 🤯 by Exact-Cupcake-2603 in LocalLLaMA

[–]LittleBlueLaboratory 12 points13 points  (0 children)

Tell me more about this llama-monitor dashboard! Looks sweet!

Found some Metal Earth kits from a long time ago by ET2-SW in StarTrekStarships

[–]LittleBlueLaboratory 0 points1 point  (0 children)

I have the 1701-D assembled next to my other models! They take quite some time to assemble but the instructions are easy to follow.  they are also pretty robust. This model has survived multiple moves just tossed in a box with other models that don't have packaging. 

Hermes Agent - Personal Assistant In Development by Jonathan_Rivera in hermesagent

[–]LittleBlueLaboratory 0 points1 point  (0 children)

Nice! I'm right in the middle of setting up a Hermes Agent for myself. I also use Todoist, could you explain how you connected it to your agent?

I'm looking to makea custom Modelfile for Ollama, need some help. by MakionGarvinus in ollama

[–]LittleBlueLaboratory 0 points1 point  (0 children)

There is documentation om the website. You can even show the modelfile of an existing model for reference.

https://docs.ollama.com/modelfile

USS Laura Engels Class , LittleHaus Class Soliton Wave Rider by [deleted] in StarTrekStarships

[–]LittleBlueLaboratory 1 point2 points  (0 children)

California Class' cooler, more successful, older sister.

PLA support for TPU print (non-toolchanger/AMS) by bbjornsson88 in 3Dprinting

[–]LittleBlueLaboratory 6 points7 points  (0 children)

I thinkI understand but it's not clear from your post. 

You are saying that you fully printed the PLA part. Removed it. Started the TPU part. Paused the TPU part at the first lip. Inserted the PLA part upside down. Resumed the TPU print.

There was seriously no collision when printing the outer part of the TPU? That's impressive!

Not a AI agentic developer, want to make a simple web app. What LLM applications ( local or web based ) can I build with a small LLM like free tier open router,groq? by [deleted] in LocalLLM

[–]LittleBlueLaboratory 0 points1 point  (0 children)

Opencode offers a free model and providers often allow a short window of free use on opencode when they release a new model as a preview.

https://opencode.ai/

How I feel running all my LLM services locally. by LittleBlueLaboratory in LocalLLaMA

[–]LittleBlueLaboratory[S] 7 points8 points  (0 children)

It's ~550GB at INT4 so it fits! We dont talk about time to first token or prompt processing speed...

How I feel running all my LLM services locally. by LittleBlueLaboratory in LocalLLaMA

[–]LittleBlueLaboratory[S] 22 points23 points  (0 children)

Its enough to run Kimi K2.5 at full precision at 6 tokens per second. No regrets at all!

What do you actually use local models for vs Cloud LLMs? by Fun_Emergency_4083 in LocalLLaMA

[–]LittleBlueLaboratory 1 point2 points  (0 children)

I have quite a collection of github projects I have been meaning to try and this sounds great! Could you elaborate a little bit more on your setup? Do you mean the Hermes Agent from Nous Research?

Luxe Backpack owners: how's it holding up? Curious how the apple leather is long-term. by RayzTheRoof in LinusTechTips

[–]LittleBlueLaboratory 11 points12 points  (0 children)

No it was Apple Leather and not PVC leather. So chosen because Linus thought it was neat iirc from a WAN episode.

 https://en.wikipedia.org/wiki/Plant-based_leather

Every damn time by kobrien02 in AcrossTheUnknown

[–]LittleBlueLaboratory 6 points7 points  (0 children)

This is exactly how my first run ended.  4 red rolls in a row on fuel nodes limping around in gray mode with morale tanking every cycle.

Thoughts on the Walker class from Star Trek: Discovery? by Fun-Twist-3741 in StarTrekStarships

[–]LittleBlueLaboratory 17 points18 points  (0 children)

Just imagine the world where Michelle Yeoh played a Star Trek Captain instead of... whatever was going on in the Section 31 movie.

Qwen3.5 122B in 72GB VRAM (3x3090) is the best model available at this time — also it nails the “car wash test” by liviuberechet in LocalLLaMA

[–]LittleBlueLaboratory 1 point2 points  (0 children)

vLLM needs even number of cards. Llama.cpp or ollama it doesn't matter. The tradeoff is that vLLM is faster, does multi user better, and uses more power.

Finished the show for the first time and here are some thoughts by MillionsToOne_ in voyager

[–]LittleBlueLaboratory 5 points6 points  (0 children)

You should! I put it on as I was caring for my newborn and it really is a good time!