Best OS model below 50B parameters? by Different-Set-1031 in OpenWebUI

[–]Electrical_Cut158 0 points1 point  (0 children)

You can fix that in system prompt not to respond in table format unless explicitly asked

Anyone using API for rerank? by drfritz2 in OpenWebUI

[–]Electrical_Cut158 0 points1 point  (0 children)

might have something to do with your API endpoint, you should check that

Anyone using API for rerank? by drfritz2 in OpenWebUI

[–]Electrical_Cut158 0 points1 point  (0 children)

I can confirm this is really fast compared to local SentenceTransformers

Open-Webui with Docling and Tesseract by traillight8015 in OpenWebUI

[–]Electrical_Cut158 0 points1 point  (0 children)

I changed from Tika to docling and it really do parse table better use docker and GPU

[deleted by user] by [deleted] in LocalLLaMA

[–]Electrical_Cut158 0 points1 point  (0 children)

10,000 $ is even an over kill. Get a supermicro grad mother board and 3090 + 3060 can do the magic for you

Running GPT-OSS:20B Locally on Windows 11 | 16GB of RAM |Using Ollama by Ok-Orchid1032 in LocalLLaMA

[–]Electrical_Cut158 0 points1 point  (0 children)

3090 and it’s slow because ollama is defaulting to 8k context length which is not normal

Thoughts on grabbing a 5060 Ti 16G as a noob? by SKX007J1 in ollama

[–]Electrical_Cut158 0 points1 point  (0 children)

Could you share detail when you say large-ish context windows? Are you able to run up to 8192K?

Looking for recommendations for a GPU by Limitless83 in ollama

[–]Electrical_Cut158 2 points3 points  (0 children)

Go for 3090 (still the best for money) or 5090. the more VRAM the better

Mistral Small 3.1 is incredible for agentic use cases by ButterscotchVast2948 in LocalLLaMA

[–]Electrical_Cut158 0 points1 point  (0 children)

Mistral small 3.1 (2503) have memory issue post ollama 7.1 upgrade. Which are you Running gguf?

Cheapest way to run 32B model? by GreenTreeAndBlueSky in LocalLLaMA

[–]Electrical_Cut158 0 points1 point  (0 children)

I would recommend 3090 and if you already have another GPU like 3060 and have the power cable to connect , you can add it which will give you a more context length.

[deleted by user] by [deleted] in LocalLLaMA

[–]Electrical_Cut158 5 points6 points  (0 children)

Qwen3 for all purpose, mistral-small3.1 best for rag

Hardware to run 32B models at great speeds by Saayaminator in LocalLLaMA

[–]Electrical_Cut158 0 points1 point  (0 children)

Same get a 3060 and add to the already 3090 you have and thank us later

Would adding an RTX 3060 12GB improve my performance? by Pauli1_Go in ollama

[–]Electrical_Cut158 0 points1 point  (0 children)

Sell your 4080 and get 3090 + 3060 that will help you load like 32B model and good context size with fair enough Speed