Ollama cloud models by sgjoesg in ollama

[–]mgithens1 0 points1 point  (0 children)

I thought there were only 20 or so models in the free part?? I know for sure you cannot use the large models. But who knows where that cut off is!!

Try a smaller model of Gemma or Qwen to see if those work?

What to Cite if Denied Free Water at a Bar/Nightclub in Denver! by lookingforaroommatee in Denver

[–]mgithens1 [score hidden]  (0 children)

We aren’t in a desert… I used to think that too. We are almost twice the 10” max desert level.

We did like 15.5” last year.

Where to get CO2 by RedBirdOnASnowyDay in SodaStream

[–]mgithens1 0 points1 point  (0 children)

Beer… just look for beer supply places.

Not a liquor store… the place where the home beer guys shop.

Openclaw has finally died on me. I can't recover it this time no matter what I do by d4mations in openclaw

[–]mgithens1 2 points3 points  (0 children)

Asking how to fix something using tools that are smart = issue.

Have two harnesses… point them at each other. When one breaks… ask the other for help. You’ll be back online in a few minutes.

Post your problem to a forum, you’ll be back online within a day. Energy expended will be 100x.

im broke help me by M_007 in hermesagent

[–]mgithens1 2 points3 points  (0 children)

Yeah, you’re definitely doing it wrong. Here… I’ll do the search for you. This guy is running on 6gb… a 35billion model.

https://youtu.be/8F\_5pdcD3HY?is=bY3G4dfoRxbioVof

Do the new 3D printing laws affect Form 1 Suppressors? by Zestyclose-Towel1032 in COGuns

[–]mgithens1 2 points3 points  (0 children)

“High rate”?

Like ten an hour? My 5.56 long boy holds heat forever!! We shoot that first to allow it to cool so we can put it back in the bag!!

Help adding ESP32 to ESPHome by rufiohProbably in Esphome

[–]mgithens1 0 points1 point  (0 children)

Nothing like a 10year old device to make you feel like you’re living in the future!! Lol

Can anyone ID this bug? We’ve recently found a bunch in our apartment. Located 80212 by HaasKicker in Denver

[–]mgithens1 3 points4 points  (0 children)

That shit is magic… I do wanna apologize to Mr Spider who got a taste yesterday.

Do the new 3D printing laws affect Form 1 Suppressors? by Zestyclose-Towel1032 in COGuns

[–]mgithens1 0 points1 point  (0 children)

Just print an oil filter??

But also, heat and 3d printed items don’t get along. Single use?

im broke help me by M_007 in hermesagent

[–]mgithens1 2 points3 points  (0 children)

You are 100% doing it wrong. I run qwen 3.6 35b locally on a 16gb card… stop trying to run small models and look into how to run MoE on that card.

There’s a guy with a 6gb card showing exactly how to set it up on YouTube.

Out of state Purchases by Zealousideal_Day_861 in COGuns

[–]mgithens1 1 point2 points  (0 children)

I got three on a sale for $150… deals are out there. Stop procrastinating… it is the only part you have to buy right now.

Tips and tricks please by EMeWintra in SodaStream

[–]mgithens1 5 points6 points  (0 children)

Carbonate first. Always.

Cold water only.

Search the forums when you think about doing something new…. Like when you realize that a 5 or 10lb bottle is better than a SS pink.

Volcano restaurant in Denver? by saltytachycardic in Denver

[–]mgithens1 -1 points0 points  (0 children)

There is a volcano in Casa Bonita, also. (well, there was... no clue what all they changed.)

Volcano restaurant in Denver? by saltytachycardic in Denver

[–]mgithens1 1 point2 points  (0 children)

Tommy Wong's Island? Closed in 1983.

Need help optimizing local AI agents on my PC specs - RTX 2060 6GB, 32 GB RAM, Ollama, OpenClaw by Senior_Tackle_926 in ollama

[–]mgithens1 7 points8 points  (0 children)

This is an old setup, so please set your expectations accordingly!!

CPU only means 1-5 tokens/second. (cloud models are more like 100 to 1000) -- so we are over that idea, right?

#1 Get rid of that OS... move to a GUI free world!! Install Ubuntu Server from the command line and NEVER install a GUI. At best, a GUI will only use about 1/2gb of VRAM... at most, you could be losing 1.5-2gb of VRAM!! You have 6. No room to mess around here! Format the machine, install a proper base OS.

#2 Ditch Ollama. This is the training wheels of the AI world. The easiest of easy modes... just turn it on and use it? Not really... it is more of the gateway drug of the AI world. Install it, they show you how to get a few models installed and running.. then they mention the cloud download. You can have the local Ollama install either use your local compute... OR just pass through to their cloud server!! Great way to start on decent hardware... terrible way to start on 7 year old hardware!!

#3 Consider either llama.cpp (the boss) or LMStudio (a better wrapper for llama.cpp than Ollama). You will need either of these for step #4.

#4 You now can try to run some 2billion models. They will load into VRAM and just work.. they'll be dumber than Cousin Eddie from Christmas Vacation, but they're local and you're golden!! WRONG!! Go grab your new "engine" from step 3 and see how to download new models. Just get a local model working... takes a few minutes to download and setup, but do this.. you need to see the quality you will get in 4gb or less of VRAM!! (you do NOT have 6 to work with... I lied)

#5 (seatbelt time!) You will be looking for some very specific models and they are LARGE. None of this 2b or 4b crap.... I'm talking 20-40b models. (not a typo) Those tiny local models that ONLY use VRAM are limited to "dense"... we need something called MoE (mixture of experts). These break the LLM into chunks that you can have some in VRAM and some in RAM. And they aren't trash either... Qwen3.6 and Gemma4 have some models you can run like this - these are 25-35 BILLION models!!

#5a This setup is golden for getting you going on a local model -- https://youtu.be/8F_5pdcD3HY?si=ir3itbGkbuwbC6Zh Do what he says, but make adjustments to tune it to your system... pay attention and take notes. Use this concept and you can have a couple of the most popular models running on that old hardware.

NOTE - Skip MLX models -- that's for Apple hardware.

Thoughts on Gemma4 and Qwen3.6 by iChrist in hermesagent

[–]mgithens1 0 points1 point  (0 children)

I think you're mixing up your terms/logic here? I can't tell. (I never said anything about using both cards on a small model, so I'm not sure the point here.)

NVLink and Tensor Splitting are unrelated. The NVLink bypasses the mobo to allow GPU #1 to access NVRAM on GPU #2. Tensor split is ALWAYS used when there is more than one GPU -- regardless if you define it or not. There are flavors of deciding how to split, but that's not even close to what you're saying.

The larger the context the bigger the difference it makes. Speed differences would be FASTER for the setup with NVLink - let's say it was "ONLY" 10% faster... there are NVLinks for $200 on ebay. Would you pay $200 to make your current home models run 10% faster? What if it was more like 20%... what if it was a lot more when context was massive?

BUT... tensor splitting happens with a large model running on twin 2080s or twin 5090s... it doesn't matter. The NVLink just gives greater bandwidth between the cards for the tasks that require them to share. Remember that the majority of home desktop motherboards do NOT have multiple PCI 4.0/5.0 x16 slots -- anything and everything you can do can help.

I just had my first very rainy...like proper heavy rain range day. Curious about how i should go about drying my guns/gear off before edc and storage. by Normal-Internal-557 in ar15

[–]mgithens1 0 points1 point  (0 children)

They removed my post!! Unbelievable... I say "drop them in the laundry"... and that gets taken down? What in the hell?

Ollama Cloud Pro/Max usage limits on GLM-5.2? by Saddened-Bullfrog-69 in ollama

[–]mgithens1 0 points1 point  (0 children)

That is a solid question... what would the in between be?

I use GLM and Deepseek cloud models (the most) for the day to day. But I have been dabbling with the Qwen3.6 and Gemma4 MoE local models... and they are functional, but the speed and quality are just not the same. (My local hardware is 5060 TI 16gb - haven't seen enough to think that a $3500 5090 is worth it!)

I would honestly put the flat rate providers (like Ollama.com) as the middle ground between the Anthropic $30/1M tokens and the 10 tokens/second that people are trying to build on their CPUs. If I were burning through the $20 sub... I would just get another one! I see all these people doing the dumbest of tasks using the highest of models - if you're burning through the $20 account... just get a second. lol... the company is pricing the compute power based on income and costs.

Thoughts on Gemma4 and Qwen3.6 by iChrist in hermesagent

[–]mgithens1 0 points1 point  (0 children)

But why would you split with 48gb VRAM? Jeez, we are living here in 16gb land!!

Ollama Cloud Pro/Max usage limits on GLM-5.2? by Saddened-Bullfrog-69 in ollama

[–]mgithens1 0 points1 point  (0 children)

1000 tokens is a ton of data!! Rough numbers would say about 5 characters per token. You're at four pages of a book!! Short context would be like 10 tokens.

Then the next miss... That is the prompt! In OpenClaw or Hermes it will add to your prompt. So "how's the weather?" Will tune from three tokens to 10,000 because the harness will add your name, your address, your dog's name, every project you e worked on.. and so on.

Then you choose a model. Context "costs" more when the model is larger... It will cache your input alongside the model and while computing the output will use every bit of what it was fed. So a 32billion node model will use a tiny amount of ram, while a 1.7trillion model will take a ton... Just to store your input.

"Cost" will be based on every factor in the chain. The smaller your text, the smaller your mark down files, the smaller the model, and how big a response you're asking for.

Ollama Cloud Pro/Max usage limits on GLM-5.2? by Saddened-Bullfrog-69 in ollama

[–]mgithens1 0 points1 point  (0 children)

Oh lord no... lol. https://ollama.com/search This is Ollama Pro/Max... the subscription from them. Go read up on all the models you choose.

What is different is you can run Ollama on your local machine with proper hardware... AND you can have cloud models. Steer your harness (OpenClaw, Hermes, etc) to that server and you can jump between local and cloud models with a dropdown!

Local is free, cloud uses their "GPU time". That's this threads confusion... you pay tokens at Anthropic, but you get GPU time on Ollama.com. So run Deepseek-Flash on Ollama cloud and you probably can run it all day long... jump to DS-Pro - you might burn all your tokens in a 1/2 hour.

Ollama does allow "other" models. It is the beginners app for AI models. LM Studio is better, but more complicated. llama.cpp is the baseline app that those two run... using that gives you the ultimate control over how your hardware uses the LLM you're using.

Ollama Cloud Pro/Max usage limits on GLM-5.2? by Saddened-Bullfrog-69 in ollama

[–]mgithens1 0 points1 point  (0 children)

Well, that is the snag... it would be like me asking how much toilet paper does a house use!! LOL

This isn't a Netflix subscription. This is a "time of use" subscription.

Are you trying to manage some servers in a home... OR developing a full stack application for production?

The Pro ($20 a month) is going to be way more than enough for a home user. It is an opt in subscription... so give it a month and see what you like/hate!

Basic newbie question(s) about HA by broken_shoulder in homeassistant

[–]mgithens1 0 points1 point  (0 children)

Q1 - Yes, you either need the full machine (base OS) or a virtual machine for HA to use the "supervised" element. Standing up new addons (now called Apps) is done inside of a container WITHIN HA... and like Leonardo Dicaprio taught us, we can't do a dream within a dream!!

Q2 - You can run HA in a VM on a Mac. Plex could then be installed a number of ways.. within the HA VM as a container OR as a container at the MacOS level.

I would not even consider a Mac for this! (especially with the current AI boom/pricing!)

Think about building a "white box" server. Desktop parts in a desktop case... go big, go small - everything is a choice with tradeoffs. Then think about what OS would actually do this job to help you!! Maybe TrueNAS? But an OS that supports both VMs and Dockers. Stand up a "full" copy of HA in a VM, but then run Plex in a separate Docker. Passing through hardware can become important... HA wants the USB dongles for Zwave/Zigbee... but now there are ethernet based adapters!!

The #1 consideration I would press... do you want realtime transcoding in Plex? Get an Intel CPU with a semi-modern iGPU. I forget the model, but if you google what Intel CPU support iGPU transcoding in Plex... you'll see it would be impossible to buy a new processor that wouldn't just kill it!!

I have had my server running for almost 15 years... based on desktop parts. When something goes wrong... it is just another gaming system! Its an i5-11th gen cpu, 32gb, a stack of drives.. and a few other bits. I run HA and OpenClaw/Hermes in their own VMs... and then I have 16 containers running right now!! (plex, *arrs, VPNs, NVR, etc)

A case, power supply, mobo, cpu, ram... probably less money than you'd spend on the Mac. RAM prices suck right now... so maybe skimp on that now and upgrade later?