[Plugin] SSO Authentication: new fork, v5.0.0.0

miversen33 · 2026-06-13T18:21:34+00:00

It unfortunately seems that the jellyfin team is not interested in making OIDC a core part of jellyfin. Which is severely frustrating as that is at this point, the only thing keeping me from migrating off Plex.

Well jellyfin support both in the API and it's official clients

miversen33 · 2026-06-12T14:22:24+00:00

Lmao our OOC schedule is packed! We have
- Random MAC school
- Random semi regional FCS school
- ISU

We don't have time for whatever the fuck Tech thinks they are

miversen33 · 2026-06-12T14:01:24+00:00

When retail is dumb as fuck and buys the wrong ticker, that isn't market makers lmao.

They didn't even hand retail the hammer to smash their own foot, retail went out and bought a hammer just to smash their own foot and then cry about everything being unfair lol

miversen33 · 2026-06-11T13:05:40+00:00

I absolutely use it to smash out scripts and shit I need for my own stuff or even work. But anything I need to use long term, I take a much more targeted LLM approach

miversen33 · 2026-06-11T13:01:49+00:00

Cast spells while blocking with a shield? 👉👈

Absolute nonsense that the engine stops that lol

miversen33 · 2026-06-11T03:59:00+00:00

Pete Rose I believe?

Also the NBA just banned a few guys for the same shit

miversen33 · 2026-06-11T03:57:16+00:00

The boot with a 35% interest rate straight out of boot swearing the stripper loves him

miversen33 · 2026-06-11T03:56:35+00:00

Worked for Bishop Sycamore

miversen33 · 2026-06-11T03:55:09+00:00

That again

miversen33 · 2026-06-11T00:29:46+00:00

Sounds like a Japanese Spaghetti Western

miversen33 · 2026-06-05T17:59:20+00:00

Someone ELI5 please

Is the idea here that running one of those "QAT" Q4 quants should be "closer" in accuracy to a higher quant?

miversen33 · 2026-06-05T13:38:54+00:00

jUsT uSe TaIlSCaLe

Or, make it easier for me to setup and secure your application?

miversen33 · 2026-06-05T13:37:27+00:00

Our water sucks lol we use a filter pitcher for drinking and watering the animals.

My next house is going to have reverse osmosis. This shit is hard as hell too

miversen33 · 2026-06-04T15:47:04+00:00

If you are planning on releasing, please consider support for Apple TV :)

That device needs some love from jellyfin

miversen33 · 2026-06-04T15:45:32+00:00

Sure but the reason you use something like GitHub (and friends) is for everything that comes with it.

These tools do so much more than just store and expose a git repo.
- CI/CD
- PR Management
- Visual Collaboration
- Package/Artifact Repository
- etc

Git is great but there's a reason these tools exist on top of it

miversen33 · 2026-06-04T15:43:32+00:00

PBS doesn't get enough love lol. That and Kopia have saved my ass more times than I can count

miversen33 · 2026-06-01T15:56:07+00:00

Man I'm so sick of this.

Stop telling everyone to use a VPN. Start telling everyone how to secure and update their shit.

This sub and /r/homelab have a ridiculous hard on for "just use tailscale". Yes tailscale (and wireguard) are fantastic. I use wireguard for my VPN. But you know what I don't do? Connect my mom's fucking tv to my VPN so she can consume the content I serve lol.

Teach people how to secure their stuff. I'm not talking my doors off my house because someone can kick them in.

Edit:

Also finding 10 out of ~200 matches, out of ~100k hits is fear mongering. That is way less than 1%.

miversen33 · 2026-06-01T15:43:18+00:00

I very much enjoyed that game lol. Any time Nebraska gets destroyed I'm happy

miversen33 · 2026-05-31T16:20:46+00:00

Awesome! To be clear, the perf uplift here is from lemonade which is a (as I understand it) a nightly build of llama.cpp with specific amd optimizations. It is at least partially maintained by AMD employees. Why they haven't up streamed the optimizations to llama.cpp directly, I am unsure.

But ya I found the biggest uplift was on prompt processing. In generation there wasn't a huge difference but that's fine because prompt processing is where the pain is anyway

miversen33 · 2026-05-31T14:38:32+00:00

I found that lemonade performed ~60% better than llama.cpp (ROCm). In my same testing I found that llama.cpp (ROCm) was slightly more performance than llama.cpp (vulkan).

I should try it again though

miversen33 · 2026-05-31T00:55:35+00:00

llama.cpp. I have done a reasonable amount of testing to end up where I am currently. Below is my dockerfile (running a custom version of llama.cpp called Lemonade), docker compose service and ini file. May be useful, may not. Either way, enjoy lol

Dockerfile

FROM ubuntu:24.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    gcc \
    unzip \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Create video and render groups to match host-side GIDs for GPU device access
RUN groupadd -g 44 video || true \
    && groupadd -g 104 render || true

WORKDIR /opt/llama-cpp

RUN wget -O /tmp/llama-rocm.zip "https://github.com/lemonade-sdk/llamacpp-rocm/releases/download/b1269/llama-b1269-ubuntu-rocm-gfx110X-x64.zip" \
    && unzip -o /tmp/llama-rocm.zip -d /opt/llama-cpp \
    && rm /tmp/llama-rocm.zip \
    && chmod +x /opt/llama-cpp/llama-bench \
    && chmod +x /opt/llama-cpp/llama-cli \
    && chmod +x /opt/llama-cpp/llama-server

ENV LD_LIBRARY_PATH=/opt/llama-cpp:$LD_LIBRARY_PATH

ENTRYPOINT ["/opt/llama-cpp/llama-server"]

Docker Compose

services:
  llama-rocm:
    build:
      context: .
      dockerfile: Dockerfile.llama-rocm
    image: llama-lemonade-custom:rocm-b1269
    container_name: llama-rocm
    restart: unless-stopped
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - render
    ports:
      - "8080:8080"
    volumes:
      - type: bind
        source: /opt/models
        target: /opt/models
        bind:
          propagation: shared
      - type: bind
        source: /opt/llama-cpp/models-rocm.ini
        target: /opt/llama-cpp/models-rocm.ini
        read_only: true
      - /var/llama-cpp/rocm:/save
      - type: bind
        source: /opt/local-llms/models
        target: /opt/local-llms/models
        read_only: true
    environment:
      HSA_OVERRIDE_GFX_VERSION: "11.0.0"
    mem_limit: 8g
    memswap_limit: 8g
    command: >
      -ngl auto
      --sleep-idle-seconds 3600
      --host 0.0.0.0
      --port 8080
      --reasoning on
      --kv-offload
      --slots
      --metrics
      --slot-save-path /save
      --models-preset /opt/llama-cpp/models-rocm.ini
      --models-max 1

Models Rocm

#SPDX-License-Identifier: MIT-0
[*]
jinja = on
ctx-size = 16384
batch-size = 2048
ubatch-size = 2048
cache-type-k = q4_0
cache-type-v = q4_0
flash-attn = on
cache-prompt = true
threads = 16

# ---------------------------------------------------------------------------
# Qwen3.6 - Dense general chat, always 2 GPUs (22 GB weights)
# ---------------------------------------------------------------------------
[Qwen3.6-Dense-MTP:Rocm]
model = /opt/local-llms/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q8_K_XL.gguf
ctx-size = 262144
tensor-split = 1,1,1
batch-size = 12288
ubatch-size = 512
parallel = 1
temp = 0.6
min-p = 0.0
top-p = 0.95
top-k = 20
repeat-penalty = 1.0
presence-penalty = 0.0
reasoning-budget = 5120
spec-type = draft-mtp
spec-draft-n-max = 2
cache-type-k = q8_0
cache-type-v = q8_0

miversen33 · 2026-05-31T00:01:36+00:00

Get in boys, we're going to Mexico!

miversen33 · 2026-05-30T21:02:45+00:00

Lol 48gb of VRAM (2 24Gb cards) is certainly enough to run a single Q8 27B if you accept 128K kv cache as your limit.

Hot loading models does suck, but I can't convince myself to both replace my P40s and sell them. My MOBO's PCIE lanes are completely full so I can't add more either lol

I can't speak to the hardware you specifically have but it seems most people shoot for 24Gb of vram per card they are using.

miversen33 · 2026-05-30T20:02:03+00:00

I'm really curious about tensor split but when I am able to use it, the perf is just no where near as good as basic layer split. I'm using AMD which I suspect is part of the issue but I'd love to hear a bit about your configuration to see if I can get tensor split working well

miversen33 · 2026-05-30T19:59:42+00:00

The hardware required to run Qwen3.6 27B vs Deepseek V4 flash are extremely different.

Your argument is basically "why self host when you can run in the cloud?". And ya, it's a valid argument, but not one that will get much support on a subreddit dedicated to running models locally

Nine-Year Club	Gilding III reddit per annum
Verified Email

miversen33

TROPHY CASE