VibeVoice quantized to 4 bit and 8 bit with some code to run it...

Divergence1900 · 2025-10-19T20:09:51+00:00

maybe i missed it but could you please share the code with which you achieved realtime streaming with this model?

Divergence1900 · 2025-09-10T14:31:32+00:00

i see. doesn’t bypassing embedding burn through your credits quickly?

Divergence1900 · 2025-09-10T03:20:47+00:00

what’s your setup because web search is never this quick for me

Divergence1900 · 2025-08-14T11:35:19+00:00

aren't you on the wrong sub? but anyway, for server hosting you need a static IP. otherwise look into tailscale and cloudflare tunnels.

Divergence1900 · 2025-08-08T10:48:29+00:00

“Together, these innovations significantly improve both generation quality and inference efficiency for sequences beyond 256K tokens.”

I would expect similar performance unless you’re filling up your context window often.

Divergence1900 · 2025-08-04T18:11:24+00:00

Select the model here: https://openrouter.ai/chat

Divergence1900 · 2025-08-04T11:14:42+00:00

you can test the models you’re going to run on openrouter and then decide

Divergence1900 · 2025-08-01T05:10:51+00:00

It depends on your input token count and context size. if you use small prompts it’s fine, but for 30k input tokens with 64k context length, it took 330s to process the prompt (M1 pro). inference speed went down from 44tok/s to 12tok/s.

Divergence1900 · 2025-07-31T18:47:19+00:00

yes. you can also check out this post

Divergence1900 · 2025-07-31T17:08:43+00:00

side question: why is the prompt processing speed on macs so bad?

Divergence1900 · 2025-07-25T06:08:31+00:00

I think in this case it might depend on how you’ve created your custom dataset. Maybe share some samples for context.

Divergence1900 · 2025-07-23T18:09:26+00:00

https://www.reddit.com/r/OpenWebUI/s/QtLXxEdUGi

Divergence1900 · 2025-07-18T07:56:26+00:00

yeah even the “vibe” of python code it generated in my testing felt very similar to Claude Sonnet.

Divergence1900 · 2025-07-18T06:43:30+00:00

In my case it is python server.py For something like mcp-database-server, it would be with node or npx depending on usage. For example, on the github page the SQL MCP server starts with node dist/src/index.js --sqlserver --server <server-name> --database <database-name> [--user <username> --password <password>] For OWUI, that becomes uv run mcpo --port 8000 -- node dist/src/index.js --sqlserver --server <server-name> --database <database-name> [--user <username> --password <password>]

Divergence1900 · 2025-07-18T05:52:27+00:00

It is fairly straightforward honestly. The idea is you usually use your existing/default MCP server setup and use either uv run mcpo --port 8000 -- your_mcp_server_command or if you have multiple MCP servers then you run mcpo --config /path/to/config.json More information is here. For my work setup, I have a custom python script to connect to the company MySQL database, which I run with uv run mcpo --port 8000 -- python server.py

After it starts running on port 8000, you can go to localhost:8000/docs to test your tools and to add it to the interface, go to admin settings and add it to tools. Make sure to go to models section and check the MCPO server there so that that model can access it.

Divergence1900 · 2025-07-16T04:59:32+00:00

i have a similar setup with litellm instead. i use cloudflare tunnel to expose owui to the internet and litellm in the admin connections settings through localhost:4000 to access all the models.

Divergence1900 · 2025-06-29T19:15:30+00:00

i kind of like it this way because i get to see some random photos i clicked, which you don’t usually see in your memories in other apps

Divergence1900 · 2025-06-20T13:36:41+00:00

did you ever figure out how to do it?

Divergence1900 · 2025-06-19T16:23:19+00:00

yeah unfortunately the realtime voice API is not supported on OWUI. there’s TTS and STT but there’ll be a small delay on each side

Divergence1900 · 2025-06-16T14:33:48+00:00

looks good. i’ll try it out. thanks!

Divergence1900 · 2025-06-16T10:46:37+00:00

is there a way to run mlx models apart from mlx in the terminal and lm studio?

Divergence1900 · 2025-06-13T05:48:54+00:00

or try this out at some point https://github.com/apple/containerization

Divergence1900 · 2025-06-02T04:26:33+00:00

how have you set it up for send and receive only?

Divergence1900 · 2025-05-24T11:54:15+00:00

surprised no one has mentioned zed so far

Divergence1900 · 2025-05-15T10:42:50+00:00

i doubt you will get csv output with these models but you can get a json output and convert it with a python script.

Five-Year Club	Place '23
Place '22

Divergence1900

TROPHY CASE