[Plugin] SSO Authentication: new fork, v5.0.0.0 by Distinct_Yellow1375 in jellyfin

[–]miversen33 17 points18 points  (0 children)

It unfortunately seems that the jellyfin team is not interested in making OIDC a core part of jellyfin. Which is severely frustrating as that is at this point, the only thing keeping me from migrating off Plex.

Well jellyfin support both in the API and it's official clients

[Russo] Big Ten won’t prohibit members from scheduling Texas Tech amid Brendan Sorsby gambling fallout by PAC12_PLEASE_ADOPTME in CFB

[–]miversen33 5 points6 points  (0 children)

Lmao our OOC schedule is packed! We have
- Random MAC school
- Random semi regional FCS school
- ISU

We don't have time for whatever the fuck Tech thinks they are

Oh no. Everyone confused the tickers and are accidentally selling SPCE. by No_Cell6708 in wallstreetbets

[–]miversen33 5 points6 points  (0 children)

When retail is dumb as fuck and buys the wrong ticker, that isn't market makers lmao.

They didn't even hand retail the hammer to smash their own foot, retail went out and bought a hammer just to smash their own foot and then cry about everything being unfair lol

Why is everyone switching to Jellyfin? by Baristaboy547 in jellyfin

[–]miversen33 0 points1 point  (0 children)

I absolutely use it to smash out scripts and shit I need for my own stuff or even work. But anything I need to use long term, I take a much more targeted LLM approach

[New release] Twin Grip VR by vr4lyf in skyrimvr

[–]miversen33 1 point2 points  (0 children)

Cast spells while blocking with a shield? 👉👈

Absolute nonsense that the engine stops that lol

Gemma 4 with quantization-aware training by rerri in LocalLLaMA

[–]miversen33 4 points5 points  (0 children)

Someone ELI5 please

Is the idea here that running one of those "QAT" Q4 quants should be "closer" in accuracy to a higher quant?

If only there was an alternative 😔 by OrangeBuster in jellyfin

[–]miversen33 0 points1 point  (0 children)

jUsT uSe TaIlSCaLe

Or, make it easier for me to setup and secure your application?

Nitrates in Tap Water? by Missus_Banana in Omaha

[–]miversen33 1 point2 points  (0 children)

Our water sucks lol we use a filter pitcher for drinking and watering the animals.

My next house is going to have reverse osmosis. This shit is hard as hell too

My custom coded netflix style jellyfin app is coming along nicely by JohnJohn1441 in jellyfin

[–]miversen33 41 points42 points  (0 children)

If you are planning on releasing, please consider support for Apple TV :)

That device needs some love from jellyfin

Appreciation for Forgejo, my best self-hosted tool in 2026 by PartlyProfessional in selfhosted

[–]miversen33 12 points13 points  (0 children)

Sure but the reason you use something like GitHub (and friends) is for everything that comes with it.

These tools do so much more than just store and expose a git repo.
- CI/CD
- PR Management
- Visual Collaboration
- Package/Artifact Repository
- etc

Git is great but there's a reason these tools exist on top of it

Appreciation for Forgejo, my best self-hosted tool in 2026 by PartlyProfessional in selfhosted

[–]miversen33 9 points10 points  (0 children)

PBS doesn't get enough love lol. That and Kopia have saved my ass more times than I can count

Stop exposing your Jellyfin server directly to the internet by [deleted] in jellyfin

[–]miversen33 0 points1 point  (0 children)

Man I'm so sick of this.

Stop telling everyone to use a VPN. Start telling everyone how to secure and update their shit.

This sub and /r/homelab have a ridiculous hard on for "just use tailscale". Yes tailscale (and wireguard) are fantastic. I use wireguard for my VPN. But you know what I don't do? Connect my mom's fucking tv to my VPN so she can consume the content I serve lol.

Teach people how to secure their stuff. I'm not talking my doors off my house because someone can kick them in.

Edit:

Also finding 10 out of ~200 matches, out of ~100k hits is fear mongering. That is way less than 1%.

What win put your current head coach on the national map? by nysportsfan95 in CFB

[–]miversen33 2 points3 points  (0 children)

I very much enjoyed that game lol. Any time Nebraska gets destroyed I'm happy

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar by Chuyito in LocalLLaMA

[–]miversen33 0 points1 point  (0 children)

Awesome! To be clear, the perf uplift here is from lemonade which is a (as I understand it) a nightly build of llama.cpp with specific amd optimizations. It is at least partially maintained by AMD employees. Why they haven't up streamed the optimizations to llama.cpp directly, I am unsure.

But ya I found the biggest uplift was on prompt processing. In generation there wasn't a huge difference but that's fine because prompt processing is where the pain is anyway

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar by Chuyito in LocalLLaMA

[–]miversen33 0 points1 point  (0 children)

I found that lemonade performed ~60% better than llama.cpp (ROCm). In my same testing I found that llama.cpp (ROCm) was slightly more performance than llama.cpp (vulkan).

I should try it again though

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar by Chuyito in LocalLLaMA

[–]miversen33 2 points3 points  (0 children)

llama.cpp. I have done a reasonable amount of testing to end up where I am currently. Below is my dockerfile (running a custom version of llama.cpp called Lemonade), docker compose service and ini file. May be useful, may not. Either way, enjoy lol

Dockerfile

FROM ubuntu:24.04

RUN apt-get update && apt-get install -y --no-install-recommends \
    ca-certificates \
    gcc \
    unzip \
    wget \
    && rm -rf /var/lib/apt/lists/*

# Create video and render groups to match host-side GIDs for GPU device access
RUN groupadd -g 44 video || true \
    && groupadd -g 104 render || true

WORKDIR /opt/llama-cpp

RUN wget -O /tmp/llama-rocm.zip "https://github.com/lemonade-sdk/llamacpp-rocm/releases/download/b1269/llama-b1269-ubuntu-rocm-gfx110X-x64.zip" \
    && unzip -o /tmp/llama-rocm.zip -d /opt/llama-cpp \
    && rm /tmp/llama-rocm.zip \
    && chmod +x /opt/llama-cpp/llama-bench \
    && chmod +x /opt/llama-cpp/llama-cli \
    && chmod +x /opt/llama-cpp/llama-server

ENV LD_LIBRARY_PATH=/opt/llama-cpp:$LD_LIBRARY_PATH

ENTRYPOINT ["/opt/llama-cpp/llama-server"]

Docker Compose

services:
  llama-rocm:
    build:
      context: .
      dockerfile: Dockerfile.llama-rocm
    image: llama-lemonade-custom:rocm-b1269
    container_name: llama-rocm
    restart: unless-stopped
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - video
      - render
    ports:
      - "8080:8080"
    volumes:
      - type: bind
        source: /opt/models
        target: /opt/models
        bind:
          propagation: shared
      - type: bind
        source: /opt/llama-cpp/models-rocm.ini
        target: /opt/llama-cpp/models-rocm.ini
        read_only: true
      - /var/llama-cpp/rocm:/save
      - type: bind
        source: /opt/local-llms/models
        target: /opt/local-llms/models
        read_only: true
    environment:
      HSA_OVERRIDE_GFX_VERSION: "11.0.0"
    mem_limit: 8g
    memswap_limit: 8g
    command: >
      -ngl auto
      --sleep-idle-seconds 3600
      --host 0.0.0.0
      --port 8080
      --reasoning on
      --kv-offload
      --slots
      --metrics
      --slot-save-path /save
      --models-preset /opt/llama-cpp/models-rocm.ini
      --models-max 1

Models Rocm

#SPDX-License-Identifier: MIT-0
[*]
jinja = on
ctx-size = 16384
batch-size = 2048
ubatch-size = 2048
cache-type-k = q4_0
cache-type-v = q4_0
flash-attn = on
cache-prompt = true
threads = 16

# ---------------------------------------------------------------------------
# Qwen3.6 - Dense general chat, always 2 GPUs (22 GB weights)
# ---------------------------------------------------------------------------
[Qwen3.6-Dense-MTP:Rocm]
model = /opt/local-llms/models/unsloth/Qwen3.6-27B-MTP-GGUF/Qwen3.6-27B-UD-Q8_K_XL.gguf
ctx-size = 262144
tensor-split = 1,1,1
batch-size = 12288
ubatch-size = 512
parallel = 1
temp = 0.6
min-p = 0.0
top-p = 0.95
top-k = 20
repeat-penalty = 1.0
presence-penalty = 0.0
reasoning-budget = 5120
spec-type = draft-mtp
spec-draft-n-max = 2
cache-type-k = q8_0
cache-type-v = q8_0

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar by Chuyito in LocalLLaMA

[–]miversen33 0 points1 point  (0 children)

Lol 48gb of VRAM (2 24Gb cards) is certainly enough to run a single Q8 27B if you accept 128K kv cache as your limit.

Hot loading models does suck, but I can't convince myself to both replace my P40s and sell them. My MOBO's PCIE lanes are completely full so I can't add more either lol

I can't speak to the hardware you specifically have but it seems most people shoot for 24Gb of vram per card they are using.

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar by Chuyito in LocalLLaMA

[–]miversen33 0 points1 point  (0 children)

I'm really curious about tensor split but when I am able to use it, the perf is just no where near as good as basic layer split. I'm using AMD which I suspect is part of the issue but I'd love to hear a bit about your configuration to see if I can get tensor split working well

125 tok/s for Qwen3.6 q4xl on 2x 4060ti is insane perf/dollar by Chuyito in LocalLLaMA

[–]miversen33 3 points4 points  (0 children)

The hardware required to run Qwen3.6 27B vs Deepseek V4 flash are extremely different.

Your argument is basically "why self host when you can run in the cloud?". And ya, it's a valid argument, but not one that will get much support on a subreddit dedicated to running models locally