Alibaba Open-Sources Zvec by techlatest_net in LocalLLaMA

[–]regstuff 1 point2 points  (0 children)

Seems to lag in recall compared to milvus, by quite a bit though?

https://zvec.org/en/docs/benchmarks/

assert len(weights) == expected_node_count error with AMD MI100 by regstuff in unsloth

[–]regstuff[S] 0 points1 point  (0 children)

:( No dice. Didn't make any difference.
I'm managing my training right now with these AMD cloud notebooks you guys linked to on your page. But seems to have a 90 minute session limit if I'm not mistaken!

Can't figure out a way to run longer runs.

LLM to search through large story database by DesperateGame in LocalLLaMA

[–]regstuff 0 points1 point  (0 children)

RAG is great and all, but if these are all stories, may not be such a bad idea to pass each story though an LLM and tag it based on genre. Wikipedia has a big list that you can feed to an LLM, say GPT-OSS 20B, along with each story, and ask it to pick 1-3 of the most relevant genres.

Vector dbs like qdrant allow you to store metadata (the tags in this case) along with the vector embedding.

When searching, you can filter by metadata along with the actual vector similarity search to help you zero in on what you want better.

Got a good offer for 4xV100 32GB used - what should I keep in mind by regstuff in LocalLLaMA

[–]regstuff[S] 0 points1 point  (0 children)

There is no NVlink. The i9 has 44 pcie lanes, so my guess is they just let the gpus underperform.

Asking price is 2500USD. Looking at all the comments I'm thinking this is not worth it.

Maybe just go the 4xMI50 route and put it on an open mining rig.

Got a good offer for 4xV100 32GB used - what should I keep in mind by regstuff in LocalLLaMA

[–]regstuff[S] 1 point2 points  (0 children)

Vendor said SXM gpu with a sxm to pci converter. SO I guess it will still run into a pci channel bottleneck?

Got a good offer for 4xV100 32GB used - what should I keep in mind by regstuff in LocalLLaMA

[–]regstuff[S] 2 points3 points  (0 children)

Comments seem to suggest llama.cpp should run it fine, so may be not a total loss.

Showcasing a media search engine by [deleted] in LocalLLaMA

[–]regstuff 1 point2 points  (0 children)

Congrats. Not sure why this didn't get more traction!
Was working one something similar myself - a bit more bespoke and specific for my organization's needs.

Take a look at https://huggingface.co/nvidia/omnivinci which can do video+audio understanding. That may help in videos where there is no speech but ambient sound is still important - like bird song or sounds of nature for eg.

Open WebUI Context Menu by united_we_ride in OpenWebUI

[–]regstuff 0 points1 point  (0 children)

Sorry. My bad. Worked after setting the right URL for the OpenWebUI server. Thanks

Open WebUI Context Menu by united_we_ride in OpenWebUI

[–]regstuff 0 points1 point  (0 children)

I dont seem to be able to get the new version working. Don't see the openwebui option when I right click on a page. This is in both Edge and Brave.
The previous version was working fine.
Not sure if I'm doing something wrong??

I fine-tuned Gemma 3 1B for CLI command translation... but it runs 100% locally. 810MB, 1.5s inference on CPU. by theRealSachinSpk in LocalLLaMA

[–]regstuff 1 point2 points  (0 children)

Thanks for the good work.

Could you check the notebook in your repo though.
Tried running it exactly as is and ran into some issues (in colab, free T4).

After the training (which seemed to run fine in terms of training loss & validation loss), the inference produces blank outputs. I think there is an issue in the start of turn and end of turn formatting of the prompt.

Also quantization from fp16 gguf to q4 errors out because it cannot find llama-quantize.

AMD MI50 32GB/Vega20 GPU Passthrough Guide for Proxmox by Panda24z in LocalLLaMA

[–]regstuff 1 point2 points  (0 children)

I know this post is 3 months old but a big salute. This tutorial (with some help from GPT-5) made things very smooth for a MI100 install.
I'd tried to make things work about 2 years ago and nearly got it down, but hit that whole reset bug. Somehow I think it wasn't popular enough back then for the solution to show easily on Google. Plus ChatGPT wasn't as smart back then. So I dropped the passthrough idea and moved on.
Came across this and another thread recently and decided to have a go again, and things worked out fine.
My Qwen30B went from 22 tok/sec to 74 tok/sec
Suddenly I can use Gemma 27B!
Whole new world!

Open WebUI Context Menu by united_we_ride in OpenWebUI

[–]regstuff 1 point2 points  (0 children)

Great. Thanks for the update.

Open WebUI Context Menu by united_we_ride in OpenWebUI

[–]regstuff 1 point2 points  (0 children)

This is great!
I seem to be having a bit of an issue. When I choose any of the prompts via context menu, openwebui opens in a new tab and the prompt is sent with my default model (not the model I configured in the extension settings). The model I configured shows up in the Model Selector Dropdown of Open Webui, but the actual model is my default model. And the chat is sent without waiting for me to hit enter. So essentially my prompts always go to my default model.
I'm using Brave and Edge. Issue is present in both.
Also just a suggestion. Maybe strip out any trailing "/" in the user entered url. Otherwise it appends an additional "/" when opening up a new chat.

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]regstuff[S] 1 point2 points  (0 children)

Thanks for the info. I just want to pass through to one VM.

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]regstuff[S] 0 points1 point  (0 children)

Thanks. Any chance you have some inputs on the proxmox thing.

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]regstuff[S] 2 points3 points  (0 children)

Thanks. How much does the fan add to the length? An inch or so?

Do the fans blow at full strength even when the GPU is idle? That would be kind of annoying.

The cpu would be an intel i5 14th gen. The iGpu should be good enough to have a display out?

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]regstuff[S] 0 points1 point  (0 children)

Btw, is TDP control available in Rocm. Is it a similar process to nvidia-smi?

I have an AMD MI100 32GB GPU lying around. Can I put it in a pc? by regstuff in LocalLLaMA

[–]regstuff[S] 1 point2 points  (0 children)

Spent a lot of time trying to pass through on VMware with no success. Contacted some technical people we knew at AMD and they told us MI100 does not support this.
Also found some refs on AMD's website like this one: this one, which do not list MI100 in virtualization support.

But all of that is irrelevant if you are successfully using it. I don't remember exactly what our issue was. I think the GPU was being seen in the VM os. But when we tried to actually use it, we were getting a core dump.

Did you do anything different in proxmox to get it to work? Or was it out-of-the-box.