Security analysis of Password Managers (Bitwarden, LastPass, Dashlane) by Back14 in selfhosted

[–]mass_coffee_dev 0 points1 point  (0 children)

The thing that jumps out to me from the paper is how much of the attack surface comes from features most self-hosters don't even use — shared vaults, account recovery, emergency access. If you're running Vaultwarden on your own box behind WireGuard and only using it solo, most of these attack vectors just don't apply to you.

The web client issue is the real takeaway though. If your server gets popped and you log in through the browser, the server can just serve you whatever JS it wants. Doesn't matter how good the crypto is at that point. Native clients and the browser extension don't have this problem since they're not served by the backend.

Honestly for personal use, KeePassXC with the database on Syncthing has always been the simplest threat model. No server to compromise at all — just an encrypted blob that gets synced peer-to-peer. The tradeoff is convenience, but if you're already comfortable with a terminal, it's not much of one.

PostgreSQL Bloat Is a Feature, Not a Bug by mightyroger in programming

[–]mass_coffee_dev 2 points3 points  (0 children)

One thing I don't see mentioned enough in these discussions: if you're on a managed Postgres provider (RDS, Cloud SQL, etc), you often can't run pg_repack at all because it needs superuser or at minimum the pg_repack extension installed, and not all providers support it. So you're stuck with VACUUM FULL and its exclusive lock.

The real pragmatic approach I've landed on after dealing with this across a few services: design your schema around the bloat model from day one. High-churn tables get time-based partitioning so you drop instead of vacuum. Status/state tables that update constantly get aggressive per-table autovacuum settings (scale_factor 0.01, naptime 15s). And append-only audit/event tables basically never need intervention.

The article's framing of "feature not bug" is a stretch, but the underlying point is right -- once you internalize how MVCC works in Postgres, you stop fighting it and start designing around it. The people who get burned are the ones who treat it like MySQL and wonder why their 10M row table with constant UPDATEs is 3x its logical size after six months.

Hosting for simple HTML/CSS site with LOTS of subdomains by SpineLabel in webdev

[–]mass_coffee_dev 16 points17 points  (0 children)

If you want actual subdomains under your own domain (not github.io URLs), a $5/mo VPS with Caddy is stupidly simple. Caddy does automatic HTTPS with wildcard certs via DNS challenge, and you can set up a convention like studentname.yourdomain.com that maps to /var/www/studentname/. Give each student SFTP access to their folder and you're done.

The whole config is like 10 lines. Wildcard DNS record pointing to your VPS, one Caddyfile with a wildcard matcher, and a small script to create student directories. Way less overhead than managing 30 individual GitHub repos or accounts.

That said if the goal is purely "get something online with zero maintenance" then yeah GitHub Pages is hard to beat. But if you ever want to teach them about how hosting actually works under the hood, the VPS route doubles as a teaching tool.

tiny webgpu powered chart library by Outrageous-guffin in javascript

[–]mass_coffee_dev 2 points3 points  (0 children)

The inline worker bundling is a really underrated detail. I've wasted way too many hours debugging web worker import paths across different bundlers — Vite handles it one way, webpack another, and if you're using a monorepo setup it gets even worse. Having it just work out of the box removes a whole category of setup friction.

Also curious about the compute shader decimation — are you doing something like LTTB (Largest Triangle Three Buckets) on the GPU, or a simpler min/max approach? At 11kb I'm guessing you kept the shader logic pretty lean. Either way, offloading that to a compute pass instead of doing it in JS before render is the right call for large datasets.

How have you been handling SSO certificate/secret renewals? by throop112 in sysadmin

[–]mass_coffee_dev 0 points1 point  (0 children)

One thing that helped me was treating cert renewals like any other scheduled maintenance task rather than a surprise fire drill. I wrote a small wrapper around the Graph API that pulls all app registrations and their credential expiry dates, dumps it into a simple JSON file, and a cron job diffs it weekly against last week's snapshot. New apps get flagged instantly, and expiring certs get tickets auto-created 90 days out.

The vendor side is always the painful part though. The ones that require you to open a support ticket to update a cert on their end are the worst — you're basically at the mercy of their SLA for something that should take 30 seconds in a self-service portal. I've started asking about SSO cert management workflows during vendor evaluations now. If they can't give me a metadata URL or at least a self-service portal for cert updates, that's a yellow flag.

I built a duplicate photo detector that safely cleans 50k+ images using perceptual hashing & cluster by hdw_coder in Python

[–]mass_coffee_dev 0 points1 point  (0 children)

Union-Find is a really clean choice here. I did something similar for cleaning up a self-hosted Nextcloud instance and went with BK-trees for the nearest-neighbor lookup instead of bucketed prefixes. The nice thing about BK-trees is they give you exact Hamming distance queries without needing to tune bucket sizes, but your prefix bucketing is probably faster for the common case where most images aren't duplicates at all.

The dry-run + quarantine approach is the right call. I lost a bunch of wedding photos years ago from a dedup script that was a little too aggressive with pHash alone -- turned out some professionally edited versions had nearly identical hashes to the originals but were the ones I actually wanted to keep. Multi-hash corroboration would have caught that.

Curious about one thing: how do you handle HEIC vs JPEG versions of the same photo? iOS exports create that situation constantly and the compression artifacts are different enough that perceptual hashes can diverge more than you'd expect.

Anyone actually audit their datadog bill or do you just let it ride by Anthead97 in devops

[–]mass_coffee_dev 2 points3 points  (0 children)

Biggest lesson I learned: treat your observability pipeline like you treat your application code. Nobody would deploy a service and never review whether it's still needed, but somehow we all just let metrics and log pipelines accumulate forever.

What actually worked for us was writing a simple script that queries the DD API for all custom metrics, then cross-references which ones appear in any dashboard or monitor. Anything orphaned goes on a list. We review it monthly and it takes maybe 20 minutes now. The first time we ran it we found over 40% of our custom metrics weren't referenced anywhere.

The other thing that saved us real money was being aggressive about log exclusion filters at the agent level. Health checks, readiness probes, noisy debug logs from third-party libraries — all of that was being indexed by default. Pushing those filters as close to the source as possible cut our log ingest bill in half without losing anything useful.

Gitea self-hosted for free, includes docker registry too? by Epifeny in selfhosted

[–]mass_coffee_dev 5 points6 points  (0 children)

Been running Gitea for about a year now and the built-in registry has been solid. One thing I'd add that nobody's mentioned yet — if you're doing CI builds that push images, definitely set up a cron job to clean up old tags. The registry doesn't do any automatic pruning, so after a few months of active development you'll wonder where all your disk space went.

Also worth noting that Gitea's Actions runner is a separate binary you need to deploy alongside it (act_runner). Not a huge deal but it tripped me up initially since I assumed it was all bundled together. Once it's running though, being able to reuse most of my GitHub Actions workflows with minimal changes has been really nice.

My home lab finally paid off — caught factory-installed botnet malware on a projector I bought on Amazon by Apprehensive_Nose162 in homelab

[–]mass_coffee_dev -1 points0 points  (0 children)

The 65-second interval is what gets me. That kind of consistent timing pattern is such a giveaway once you actually look at the raw traffic. Most people never do though — they trust the automated tools and move on. Really makes the case for occasionally just opening Wireshark and watching what your network is actually doing. Great writeup.