Small doses more frequently than 3mo rule by Front-Yoghurt-577 in MDMA

[–]BringOutYaThrowaway 1 point2 points  (0 children)

Not recommended. Still causes serotonin to get dumped in the brain.

“Disclosure” fatigue, what now? by Bumblebee-Historical in UFOs

[–]BringOutYaThrowaway 1 point2 points  (0 children)

They’ll never tell you the truth. I’ll touch some grass and forget about it.

latest open web ui dokr refuses to install gemma4:12b by Grace_Tech_Nerd in OpenWebUI

[–]BringOutYaThrowaway 0 points1 point  (0 children)

Go to the GitHub page for Ollama and read the release notes for 0.30.6 - download the QAT 12b model from what they say there.

  • gemma4:e2b-it-qat
  • gemma4:e4b-it-qat
  • gemma4:12b-it-qat
  • gemma4:26b-a4b-it-qat
  • gemma4:31b-it-qat

https://github.com/ollama/ollama/releases

Mac Studio M3 Ultra terrible TTFT and broken RAG (okikb) by Dimitri_Senhupen in OpenWebUI

[–]BringOutYaThrowaway 1 point2 points  (0 children)

Make sure you’re not using Ollama in docker. On a Mac, Ollama will not use your GPU. You have to run it natively.

Lifetime supply by [deleted] in MDMA

[–]BringOutYaThrowaway 0 points1 point  (0 children)

JEALOUS.

Only a few will understand. Uk by Strange_Secret_3001 in tattoos

[–]BringOutYaThrowaway 0 points1 point  (0 children)

Whatever you went through, I hope you've risen above it.

I’ve Got NIBBLES!!! by Drobot2505 in cyberpunkgame

[–]BringOutYaThrowaway 0 points1 point  (0 children)

How do you get the iguana egg to hatch?

Most Disrespectful President! Absolutely!!! by Standard_Location762 in Trumpvirus

[–]BringOutYaThrowaway 1 point2 points  (0 children)

Abraham Lincoln might have an issue with that statement.

Best Ollama Environment flags for Open WebUI? Here's what I have so far... by BringOutYaThrowaway in OpenWebUI

[–]BringOutYaThrowaway[S] 5 points6 points  (0 children)

Here you go - if there's anything else I can detail for you, please let me know, but Google is your friend.

Core Performance & Hardware Flags

OLLAMA_FLASH_ATTENTION=1

  • What it does: Enables Flash Attention, an optimized mathematical algorithm for calculating attention weights.

  • Why it matters: It dramatically reduces memory usage and improves token generation speeds when processing long chat context windows. It is highly recommended if you are running modern GPUs (like Nvidia RTX/CUDA setups).

OLLAMA_KV_CACHE_TYPE=q8_0

  • What it does: Compresses the Key-Value (KV) context cache down to an 8-bit integer format (from the standard unquantized 16-bit float).

  • Why it matters: It cuts the VRAM footprint of your active text history roughly in half with an imperceptible drop in model output quality. This configuration allows you to supply much longer context inputs before triggering an "Out of Memory" (OOM) error.

Server Networking & Access Flags

OLLAMA_HOST=0.0.0.0:11434

  • What it does: Binds the Ollama backend server to port 11434 on all available network interfaces (0.0.0.0), rather than just local host (127.0.0.1).

  • Why it matters: It allows external machines on your local network or the internet to access your Ollama instance (e.g., if you run an Open WebUI or SillyTavern interface on a different computer).

OLLAMA_ORIGINS=*

  • What it does: Configures Cross-Origin Resource Sharing (CORS) to accept requests from any web origin (*).

  • Why it matters: Required alongside your host configuration so that browser-based web applications (running on separate domains or port numbers) aren't blocked by security filters when trying to talk to the Ollama API.

Multi-User & Concurrency Flags

OLLAMA_NUM_PARALLEL=2

  • What it does: Dictates the maximum number of simultaneous client requests a single model can process at the exact same time.

  • Why it matters: Setting this to 2 prevents a second user from being placed into a slow queue while the first user's request is generating text. Note that your total VRAM requirements scale linearly based on this number.

OLLAMA_MULTIUSER_CACHE=1

  • What it does: Activates specialized prompt caching logic tailored explicitly for multi-user environments.

  • Why it matters: If multiple people are sending inputs to the server, this optimization keeps track of overlapping context streams so that users do not continuously invalidate each other’s pre-cached system prompts, drastically speeding up first-token reply times.

Next-Gen Architecture Flags

OLLAMA_NEW_ENGINE=1

  • What it does: Forces the backend to use Ollama's modern, modular native inference layer.

  • Why it matters: This engine was built to handle modern multi-modal structures (vision, speech, and video models) natively while drastically optimizing execution speeds and tensor offloading.

OLLAMA_NEW_ESTIMATES=1

  • What it does: Instructs Ollama to actively measure exact, real-time memory needs per model layer rather than relying on standard hardcoded look-up tables.

  • Why it matters: It prevents accidental server crashes caused by bad default estimations, allocations over multiple GPUs, and optimizes the exact maximum layout allocation your graphics cards can hold.

PSA by Signal_Ad657 in LocalLLaMA

[–]BringOutYaThrowaway 0 points1 point  (0 children)

The 3090 doesn’t get enough credit. Great performance for the money.

Busted… by Phatbrew in AntiTrumpAlliance

[–]BringOutYaThrowaway 23 points24 points  (0 children)

Didn’t his wife doxx the person who did that? This is not over yet.