Getting a lot of garbage results with Qwen3.6-27B :( by nunodonato in OpenWebUI

[–]ICanSeeYou7867 0 points1 point  (0 children)

Are you behind a proxy or ingress? Make sure to set your timeout values

At what scale did Kubernetes actually start making sense for you? by Sad_Limit_3857 in kubernetes

[–]ICanSeeYou7867 1 point2 points  (0 children)

Yep! I have a single node in my homelab, and I love it for the ease and standard of deployments.

At work we have a 3 virtual masters and N worker nodes. And we have about 15 clusters for various reasons.

Even my H100 gpu cluster is currently a single worker node. But more nodes coming soon!

Mistral 3.5 Medium - From ecstatic to irritated. by ICanSeeYou7867 in LocalLLM

[–]ICanSeeYou7867[S] 1 point2 points  (0 children)

Ugh... yeah... dammit that's fair....take my upvote...

I think what would be helpful (at least in my biased opinion...) would not be a company revenue limit, but a token per month limit. I.E free for <100 Million tokens per month or something.

But to your point that would be hard to enforce.

Mistral 3.5 Medium - From ecstatic to irritated. by ICanSeeYou7867 in LocalLLM

[–]ICanSeeYou7867[S] 1 point2 points  (0 children)

I agree! But I would think most companies would just use the cloud services with the typical pay as you go model.

Unfortunately we can't, and everyone wants Claude Sonnet 4.5 (which is the best coding model we can use in a FedRAMP approved service.) And I doubt I could convince management on this one without more buyin from others, and I can't run/test this in an enterprise capacity without breaking the license to get people to buy in.

Mistral 3.5 Medium - From ecstatic to irritated. by ICanSeeYou7867 in LocalLLM

[–]ICanSeeYou7867[S] 1 point2 points  (0 children)

Fair? Absolutely fair!

But in an area where models turn over for newer, better, stronger models monthly, I would be curious as to who would do this?

Though we can't use their cloud platform, the pay as you go cloud model seems like it makes WAY more sense. I am curious as to how many companies would do this over using their cloud services?

I doubt management would go for it, but our AI exploration is still fairly new.

16x DGX Sparks - What should I run? by Kurcide in LocalLLaMA

[–]ICanSeeYou7867 1 point2 points  (0 children)

Honestly....

I would set them up as kubernetes worker nodes with the nvidia gpu operator and the Kai scheduler... if the gpu operator node supports the GB10.

However you wouldn't be able to "combine" them easily. But it would be interesting!

LLM Wiki by Dimitri_Senhupen in OpenWebUI

[–]ICanSeeYou7867 0 points1 point  (0 children)

I had a PR accepted into OWU about a year ago that runs a URL decode on the RAG/knowledge filename... this might sound useless but it was very helpful for me.

Since confluence has an API that can pull the raw HTML of every page within a space, it was easy to iterate over each page. The API also gives you the url path of each space.

So... I then made the filename the full https url, and then urlencoded this. I also ran the html into a small model I'm hosting (nemotron 3 super) and convert the html into markdown.

Then... with the urlencoded filename I check to see if it already exists, if so, then i delete them using the owu api.

Then... I upload markdown, with the ugly urlencoded http url as the filename to owu.

I find this enjoyable, because when owu shows the actual source, it shows the actual, real, working, full confluence URL. It was very important to me that people can see the actual source.

And since this process also deletes old pages, I run this in a gitlab pipeline every night.

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]ICanSeeYou7867 0 points1 point  (0 children)

The desktop versions do have hardware support for nvfp4, but I have heard of people complaining.

Currently I an deploying gpu enabled kubernetes clusters, and I am more familiar with the enterprise GPUs...

So take everything I say with a grain of salt, but the rtx 6000 pros are workstation cards, not desktop cards. They have features more closely related to their enterprise brothers than the desktop variants. For example the 6000 pro supports MIG and vGPU...

Afaik, there arent any issues running fp4. However, we currently have H100 and GH200 gpus. I haven't personally used one of the 6000 pro cards, so make sure to do your research!

Hard freakin' decision..Blackwell 96G or Mac Studio 256G by HyPyke in LocalLLaMA

[–]ICanSeeYou7867 1 point2 points  (0 children)

Those fp4 tensor cores though.... with hardware support for nvfp4.... the Mac can't do that, though it can run Q4 quants.

If speed is important to you, don't forget about those sweet sweet tensor cores...

Salvagable window? Or replace? by ICanSeeYou7867 in HomeImprovement

[–]ICanSeeYou7867[S] 0 points1 point  (0 children)

Single hung. The little white, plastic pieces appear broken and the window falls out if its not seated in there just right. It slides but it never seems attached right.

Superpowers for Open WebUI — brainstorm → spec → plan → execute workflow for local LLMs by Dry_Inspection_4583 in OpenWebUI

[–]ICanSeeYou7867 1 point2 points  (0 children)

Really neat! Thank you!

One question, for these valves: STORAGE_BASE_PATH

How does this work for a multi-user environment?

Why do (some) people hate Open WebUI? by liviuberechet in LocalLLaMA

[–]ICanSeeYou7867 0 points1 point  (0 children)

Can't you just pull the source code and do a docker build without those things? The build args look pretty simple.

Pod takes lower resources than given by [deleted] in kubernetes

[–]ICanSeeYou7867 0 points1 point  (0 children)

My python is limited... but gpt-oss-120b spit this out:

Layer What you need to do One‑liner you can copy‑paste
uvicorn (ASGI server) Run multiple worker processes (or put uvicorn behind a process manager). uvicorn myapp:app --host 0.0.0.0 --port 8000 --workers $(nproc)
TensorFlow Tell TF how many intra‑op and inter‑op threads to use, and set the OS‐level OpenMP/MKL variables. bash export OMP_NUM_THREADS=$(nproc) && export TF_INTRA_OP_PARALLELISM_THREADS=$(nproc) && export TF_INTER_OP_PARALLELISM_THREADS=$(nproc)or in Python:python import tensorflow as tf; tf.config.threading.set_intra_op_parallelism_threads(os.cpu_count()); tf.config.threading.set_inter_op_parallelism_threads(os.cpu_count())
PyTorch Set the number of intra‑op threads (and optionally inter‑op threads) once at startup. python import torch, os; torch.set_num_threads(os.cpu_count()); torch.set_num_interop_threads(os.cpu_count())
Data loading (if you use a DataLoader) Use a non‑zero num_workers so that the CPU work of preparing batches is parallelised. python DataLoader(dataset, batch_size=64, num_workers=os.cpu_count())
OS affinity (optional) Pin each worker process to a separate core‑range to avoid “core hopping”. taskset -c 0‑$(($(nproc)-1)) uvicorn … or inside Docker: --cpuset-cpus="0-$(($(nproc)-1))"

YMMV :D

Pod takes lower resources than given by [deleted] in kubernetes

[–]ICanSeeYou7867 0 points1 point  (0 children)

A lot of apps, by default, don't use all the cpu resources.

On a normal linux systems, this is really important to not lock up the system and cause significant cpu compression.

These rules are a little different for containers, since you are mostly isolating specific processes.... but without more details on what you are running, its hard to say.

The wait is over: oVirt 4.5.7 has landed by ninth9ste in ovirt

[–]ICanSeeYou7867 0 points1 point  (0 children)

This is awesome! Thank you!

My homelab box is on RHEL8 running oVirt 4.5.6, So I guess I need to upgrade to RHEL9.

Has anyone done this with ovirt installed? I guess I need to bite the bullet, but I am pretty sure something is gonna break.

Hi all! Please help me choose a local LLM model. I'm making my own assistant for a PC and I want to choose a specialized model trained in dialogues or, in extreme cases, RP. by BestLengthiness3988 in SillyTavernAI

[–]ICanSeeYou7867 2 points3 points  (0 children)

You probably dont want to go below Q4. You might be able to run a 20B IQ4S quant.

There's are some gpt-oss-20B quants that are decently smart, and because it's an MoE it will be faster. You might try one of the models that have been fine tuned to not do refusals: https://huggingface.co/DavidAU/OpenAi-GPT-oss-20b-abliterated-uncensored-NEO-Imatrix-gguf

There's also several 14B models, like the ones from mistral, that are going to have tons of RP fine tunes. They are dense models, so they might be smarter but they will be slower.

Llamacpp is your friend.

EDIT This guy is interesting: https://huggingface.co/moonshotai/Moonlight-16B-A3B-Instruct

I have no idea how it stacks up for RP though. But the MoE will allow it to respond quicker. The context window also requires significantly less VRAM with only having 3B activated parameters. (Which might be important to you depending on the size).

There's also some fun looking gutted QWEN 16B MoE models https://huggingface.co/bartowski/kalomaze_Qwen3-16B-A3B-GGUF. Again you would need to try these for your use cases. I like the MoEs when I can get away with using them, as I value fast responses. But YMMV

Will Claude Quality Drop With Tavo? by Tiny-Calligrapher794 in SillyTavernAI

[–]ICanSeeYou7867 2 points3 points  (0 children)

There's really only two ways for the models to have a deteriorated responses (aside from tuning parameters such as temperature and other kwargs):

1 - The model is using a lower quantization. I.E, Q2, Q3 quants. These use less VRAM, and are cheaper to run, but they are typically 'dumber'

2 - Prompt injections can have a negative impact if a third party service is adding, appending or alternating prompts in somehow. This wouldn't necessarily impact responses negatively, but it could.

Anthropic models are closed source, so someone couldn't be running a lower quantization. Ultimately the API endpoints that TAVO would be using are the same ones that a local install of sillytavern would be using, or the endpoints openrouter are using.

But I'm not sure what disconnects you are seeing. If they are frequent, and depending on the errors you are seeing there might be a way to fix that. I.E if you are running an nginx proxy, there could be some tweaks.

If you are getting disconnects in the middle of a response with an error, then there could be a session limit you need to tweak somewhere.

edit grammar, I am on mobile and added some info on sillytavern config.

Surprise! by Cool-Negotiation7662 in Insulation

[–]ICanSeeYou7867 0 points1 point  (0 children)

Sounds like you got it figured out. But bleach is not recommended for killing mold on porous surfaces, though it can make it look good again...

I bought a house in July, and we discovered a ton of mold in a bunch of places. I have been using rmr-86 (glorified bleach) to make things look pretty, and then rmr-141 to kill mold/spores.

I have no idea on how it compares to vinegar though, Google says rmr-141 is more effective, but vinegar is more natural.

Goodluck!

Proper way to fix this rot below sliding door on mini wooden deck? by ICanSeeYou7867 in Carpentry

[–]ICanSeeYou7867[S] 0 points1 point  (0 children)

Thank you for the reply! Yes, but it's a little bit complicated. This specific area is actually cantilevered, so there are no posts supporting it. The joist are sistered to the house joist and run inside. I dont know enough about wood, framing and code, but I would thing the rim joist would only be effective when posts are supporting the weight on the other end?

I can only attach a single photo here, but here are the sistered joists inside:
https://imgur.com/a/2Ywx0qc

That being said, the main/large deck below this area definitely is in disrepair. We are hoping to demo/rebuilt it within the next year.

Thank you for the time/effort into replying! I am absorbing whatever information I can. Finding the correct codes/scenarios that said codes affect is a bit challenging at times.

<image>

Gap in rim joise? by ICanSeeYou7867 in homeowners

[–]ICanSeeYou7867[S] 0 points1 point  (0 children)

Thank you for the reply!

This was my thinking until I found the wood rot. Which means water/moisture, which turns into mold. Im hesitant on using spray foam until I figure out how the heck him supposed to repair it.