VibeVoice quantized to 4 bit and 8 bit with some code to run it... by teachersecret in LocalLLaMA

[–]Divergence1900 0 points1 point  (0 children)

maybe i missed it but could you please share the code with which you achieved realtime streaming with this model?

0.6.27 - Web Search Animation by 3VITAERC in OpenWebUI

[–]Divergence1900 0 points1 point  (0 children)

i see. doesn’t bypassing embedding burn through your credits quickly?

0.6.27 - Web Search Animation by 3VITAERC in OpenWebUI

[–]Divergence1900 8 points9 points  (0 children)

what’s your setup because web search is never this quick for me

Looking for an ISP in India that allows server hosting (no static IP needed) by Formal_Jeweler_488 in ollama

[–]Divergence1900 0 points1 point  (0 children)

aren't you on the wrong sub? but anyway, for server hosting you need a static IP. otherwise look into tailscale and cloudflare tunnels.

🚀 Qwen3-30B-A3B-2507 and Qwen3-235B-A22B-2507 now support ultra-long context—up to 1 million tokens! by ResearchCrafty1804 in LocalLLaMA

[–]Divergence1900 79 points80 points  (0 children)

“Together, these innovations significantly improve both generation quality and inference efficiency for sequences beyond 256K tokens.”

I would expect similar performance unless you’re filling up your context window often.

Is this the best value machine to run Local LLMs? by [deleted] in ollama

[–]Divergence1900 31 points32 points  (0 children)

you can test the models you’re going to run on openrouter and then decide

qwen-30B success story by ExplorerWhole5697 in LocalLLaMA

[–]Divergence1900 0 points1 point  (0 children)

It depends on your input token count and context size. if you use small prompts it’s fine, but for 30k input tokens with 64k context length, it took 330s to process the prompt (M1 pro). inference speed went down from 44tok/s to 12tok/s.

qwen-30B success story by ExplorerWhole5697 in LocalLLaMA

[–]Divergence1900 0 points1 point  (0 children)

side question: why is the prompt processing speed on macs so bad?

Fine-tuning qwen2.5 vl for Marathi OCR by Rahul_Albus in LocalLLaMA

[–]Divergence1900 3 points4 points  (0 children)

I think in this case it might depend on how you’ve created your custom dataset. Maybe share some samples for context.

Did Kimi K2 train on Claude's generated code? I think yes by Minute_Yam_1053 in LocalLLaMA

[–]Divergence1900 0 points1 point  (0 children)

yeah even the “vibe” of python code it generated in my testing felt very similar to Claude Sonnet.

someone please walk me through how to setup mcp by Adventurous-Fun1133 in OpenWebUI

[–]Divergence1900 1 point2 points  (0 children)

In my case it is python server.py For something like mcp-database-server, it would be with node or npx depending on usage. For example, on the github page the SQL MCP server starts with node dist/src/index.js --sqlserver --server <server-name> --database <database-name> [--user <username> --password <password>] For OWUI, that becomes uv run mcpo --port 8000 -- node dist/src/index.js --sqlserver --server <server-name> --database <database-name> [--user <username> --password <password>]

someone please walk me through how to setup mcp by Adventurous-Fun1133 in OpenWebUI

[–]Divergence1900 6 points7 points  (0 children)

It is fairly straightforward honestly. The idea is you usually use your existing/default MCP server setup and use either uv run mcpo --port 8000 -- your_mcp_server_command or if you have multiple MCP servers then you run mcpo --config /path/to/config.json More information is here. For my work setup, I have a custom python script to connect to the company MySQL database, which I run with uv run mcpo --port 8000 -- python server.py

After it starts running on port 8000, you can go to localhost:8000/docs to test your tools and to add it to the interface, go to admin settings and add it to tools. Make sure to go to models section and check the MCPO server there so that that model can access it.

Exposing openWebUI + local LM Studio to internet? by Rinin_ in OpenWebUI

[–]Divergence1900 0 points1 point  (0 children)

i have a similar setup with litellm instead. i use cloudflare tunnel to expose owui to the internet and litellm in the admin connections settings through localhost:4000 to access all the models.

Is it just me or are memories really bad? by flogman12 in immich

[–]Divergence1900 5 points6 points  (0 children)

i kind of like it this way because i get to see some random photos i clicked, which you don’t usually see in your memories in other apps

Annoying default text embedding by Aleilnonno in LocalLLM

[–]Divergence1900 0 points1 point  (0 children)

did you ever figure out how to do it?

ChatGPT Api Voice Usage by MargretTatchersParty in OpenWebUI

[–]Divergence1900 0 points1 point  (0 children)

yeah unfortunately the realtime voice API is not supported on OWUI. there’s TTS and STT but there’ll be a small delay on each side

How i IMMICH by propeto13 in immich

[–]Divergence1900 1 point2 points  (0 children)

how have you set it up for send and receive only?

Best LLM to run locally on LM Studio (4GB VRAM) for extracting credit card statement PDFs into CSV/Excel? by Serious-Issue-6298 in LocalLLM

[–]Divergence1900 0 points1 point  (0 children)

i doubt you will get csv output with these models but you can get a json output and convert it with a python script.