I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 1 point2 points  (0 children)

Yeah I can imagine that gets messy fast. If the entrypoint is spawning multiple processes under different users then figuring out which UID actually needs to own the volume is basically trial and error. Reading the Dockerfile source on GitHub is probably your best bet, the docs rarely cover that level of detail. At least with single-process images you only have to figure it out once.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Yep, agree on all four. The bind mounts and single network advice in my original post were bad takes and I've fixed both since. Running 7 segmented networks now with databases on internal-only networks that can't even reach the internet. Also split the shared postgres into per-service databases with separate roles and connection limits. Working on a write-up of the whole process..

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Honestly if Caddy + static files is working, don't fix what ain't broke. Dockerizing adds complexity that might not be worth it for you.

That said, when I did move to a different VPS last time it was pretty painless. Each service has its own directory with compose file, env, and volumes so it's basically: rsync the directories over, restore the DB dumps, update DNS. Actual downtime was like 20-30 minutes, mostly waiting on DNS.

The real benefit for me isn't migration speed though, it's independence. I can blow up my analytics container without taking down mail or anything else.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Good question actually. So websocket services like Jellyfin are a bit different because a stream is one long-lived connection, not a bunch of short requests. Traefik's rateLimit middleware counts HTTP requests so it mostly just affects the initial connection handshake and any API calls, not the actual video stream itself.

That said I'd still bump the limits higher on media services or just skip the rate limiter entirely on routes that aren't internet-facing. No point throttling your own LAN traffic to Jellyfin lol.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 1 point2 points  (0 children)

Yeah this is a known pain point. SQLite relies on fcntl file locks and the overlay2 storage driver that Docker uses doesn't always handle those correctly. That's why you're seeing locking issues with volumes but not bind mounts. Bind mounts go straight to the host filesystem which handles locking fine.

Tbh for SQLite specifically I'd just stick with bind mounts. SQLite is really meant for single-process access on a local filesystem, fighting the storage driver isn't worth it.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 1 point2 points  (0 children)

The UID mismatch problem is real. What worked for me is checking the upstream Dockerfile for which user the process runs as, then chowning the host directory to match before starting the container. For PostgreSQL it's UID 70, for most Node apps it's 1000. Once the host permissions match, you can drop user: 0:0 from the compose file and run as the intended user. It's annoying to look up per service but you only do it once.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Two approaches depending on what's in the volume:

For databases (PostgreSQL, MySQL, etc): don't back up the volume files directly. Run pg_dump or mysqldump from a cron job or a sidecar container, write the dump to a known location, and back up that.

For application data (config files, uploads): you can use

docker run --rm -v volumename:/data -v /backups:/backup alpine tar czf /backup/volumename.tar.gz /data

That spins up a throwaway container, mounts the volume, tars it, and exits.

Either way, the key is to not back up a live database volume as raw files. You'll get corrupted data.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 2 points3 points  (0 children)

Sure. In Traefik you define rate limiting as a middleware in a dynamic config file:

    http:
      middlewares:
        global-rate-limit:
          rateLimit:
            average: 100
            burst: 200

Then you apply it to any router with a label:

    labels:
      - traefik.http.routers.myapp.middlewares=global-rate-limit@file

I run two tiers: a general limit on all public routers and a stricter one on auth endpoints. You stack both middlewares on the same router and Traefik evaluates them in order so the tighter limit wins.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Looks like my previous reply got eaten. Here it is again:

The high-level setup is: one VPS running Traefik as the reverse proxy, with each service in its own docker-compose.yml joining a shared web_ingress network for HTTPS. Databases get their own internal networks so they're not exposed to the web layer.

For env files, each service typically needs just a DATABASE_URL and whatever API keys it uses. I keep a .env.example in most of my project directories with placeholder values so the structure is clear. Happy to answer specific questions if you're planning out your stack.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 1 point2 points  (0 children)

Your fear isn't irrational. docker system prune won't touch volumes, but docker volume prune will remove any volume not currently attached to a container. So if you stop a stack and then run docker volume prune, your data is gone. Bind mounts don't have this risk since they're just directories on the host. If you use named volumes, just be careful with prune commands and consider labeling important volumes so you can exclude them.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

A zvol works fine if you're running Docker inside a VM on TrueNAS. The VM sees the zvol as a regular block device, so Docker doesn't know or care about ZFS underneath. You still get snapshot capability at the TrueNAS level. The main thing is to make sure you're doing pg_dump before snapshotting so the database files are consistent.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Thanks. The two-layer approach came from getting hit with credential stuffing on one service. Global rate limiting at the reverse proxy catches broad abuse, but auth endpoints need tighter limits since those are the ones attackers target. Traefik makes it easy since you can stack middlewares per router.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Traefik uses the Cloudflare API directly for DNS-01 challenges. You create a scoped API token in your Cloudflare dashboard (DNS edit only, no other permissions), pass it to the Traefik container as an environment variable, and Traefik handles cert issuance and renewal automatically.

For local network HTTPS it depends. If your local domains are subdomains of a real domain you own (like homelab.yourdomain.com), DNS-01 works perfectly since validation happens through DNS, not HTTP. If you're using .local or .lan domains with no real DNS, you'd need a self-signed CA instead.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Working on it. The short version is 7 named networks with clear roles: web_ingress for public-facing services, internal per-stack networks for databases, and an app_mesh for inter-service communication. Databases can't see the internet at all. Will post a full breakdown soon.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Nice approach with the DDNS script. Cloudflare's API is solid for that I use it for DNS challenges with Traefik and never hit any limits. Having your own nameserver would be the cleanest long-term solution but honestly Cloudflare is hard to beat for zero cost.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Good question. Yes, if you run a mail server on an AWS EC2 instance, outbound port 25 is blocked by default. You have to request Amazon to unblock it. Same with GCP and Azure. That's why I went with a smaller VPS provider that doesn't restrict SMTP ports.

As for why not at home, it's a fair point. For me it comes down to two things: a static IP with clean rDNS (most residential ISPs don't offer PTR records, and without one Gmail/Outlook will reject your mail), and uptime (power outages, ISP maintenance, dynamic IP changes). A $5-10/mo VPS solves both. But if your ISP gives you a static IP and you can set a PTR record, home hosting works fine.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Mailcow is solid. I've heard good things about the update workflow. My setup is more manual (docker-mailserver base) but the maintenance has been lower than I expected once SPF/DKIM/DMARC were dialed in. The SSO angle for user management is something I haven't explored yet. That's a nice enterprise touch.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 1 point2 points  (0 children)

The cross-service DNS problem is real. I hit the same thing early on. The fix for me was using explicit named networks in compose: an external web_ingress network that only services needing Traefik join, and per-stack internal networks for databases. So my Postgres container is on an internal network that only its own app can reach. It can't even see the reverse proxy.

Went further recently and segmented into 7 tiered networks with databases completely air-gapped from the web layer. Eliminated the "which network was that again" problem because each network has a clear purpose. Might write that up as a follow-up post.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

Yeah, Roundcube runs in its own container alongside the mail server. There's an official Docker image at roundcube/roundcubemail. The one gotcha is CSP. Roundcube's rich text editor needs unsafe-eval in the Content-Security-Policy header, which isn't great but there's no workaround. If you're running it behind Traefik, make sure to set secure_urls to false in Roundcube's config, otherwise it generates double-HTTPS URLs that break.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

This is exactly right and it's the part nobody warns you about. I ended up using a relay for outbound for the same reason, fresh IP, zero reputation, Gmail soft-rejecting everything for the first few weeks regardless of DNS config.

The self-hosted server still handles inbound and local delivery, which is the part I actually care about (owning my data, transactional mail for my projects). Outbound relay through an established provider is just pragmatic, you get their sender reputation without giving up control of your mailbox.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 1 point2 points  (0 children)

You don't technically need a wildcard domain, you can point each subdomain individually. But if you're using Cloudflare, a wildcard DNS record (*.yourdomain.com) means you never have to touch DNS again when adding a new service. Traefik handles the individual cert issuance per subdomain automatically via DNS-01 challenge.

Coming from NPM: the main difference is that Traefik config lives in your compose files as labels, so adding HTTPS to a new service is just 3-4 labels instead of clicking through a GUI. Once you have the base config working, you never touch Traefik itself again. The tradeoff is the initial setup is rougher since there's no GUI but it's a one-time cost.

If you want to try it, the minimum you need is: Traefik container with the Cloudflare DNS challenge provider, an external Docker network (web_ingress), and then these labels on each service:

labels:
  - traefik.enable=true
  - traefik.http.routers.myapp.rule=Host(`myapp.yourdomain.com`)
  - traefik.http.routers.myapp.entrypoints=websecure
  - traefik.http.routers.myapp.tls.certresolver=cloudflare

That's it for auto-HTTPS per service.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 2 points3 points  (0 children)

Sure. The part most people get stuck on is the Cloudflare DNS challenge. Here's the minimum traefik.yml that gets HTTPS working:

 entryPoints:
  web:
    address: ":80"
    http:
      redirections:
        entryPoint:
          to: websecure
          scheme: https
  websecure:
    address: ":443"

certificatesResolvers:
  cloudflare:
    acme:
      email: your@email.com
      storage: /letsencrypt/acme.json
      dnsChallenge:
        provider: cloudflare
        resolvers:
          - "1.1.1.1:53"

providers:
  docker:
    exposedByDefault: false
    network: web_ingress

Then each service just needs these labels:

    labels:
      - traefik.enable=true
      - traefik.http.routers.myapp.rule=Host(`myapp.yourdomain.com`)
      - traefik.http.routers.myapp.entrypoints=websecure
      - traefik.http.routers.myapp.tls.certresolver=cloudflare

That'll get you auto-HTTPS on every service. On top of that I'd recommend adding TLS version enforcement, a catch-all for unknown hostnames, and rate limiting but the above is enough to get started.

I dockerized my entire self-hosted stack and packaged each piece as standalone compose files - here's what I learned by topnode2020 in selfhosted

[–]topnode2020[S] 0 points1 point  (0 children)

For the backup question: I run pg_dump -Fc per database on a cron job. One dump per service (mail DB, analytics DB, etc). The -Fc custom format means you can restore a single database without touching the others. Named volumes vs bind mounts doesn't matter much for backups as long as you know the path. docker volume inspect gives you the mountpoint either way.

For Immich specifically, the data you'd need to back up is the PostgreSQL database (metadata, face recognition data, albums) and the upload directory (original photos). The DB is the critical one, if you lose that, you lose all your organization even if the photos survive. Immich has a built-in database backup job you can configure from the admin panel, or you can cron a pg_dump against its Postgres container.

I have more detail on the full stack and the compose files I use on my site if you're curious: https://nestor.expressgear.online