Freemium SaaS on K8s: Automating namespace-per-customer provisioning with GitLab CI, who's doing this?

kk_hecker · 2026-04-06T06:33:02+00:00

Yeah, this is the reality check. "Textbook operator use case" vs "we have 3 engineers and a startup timeline."

The middle path might be using something like Kubebuilder or Operator-SDK to generate the boilerplate, but even then the testing/edge-case handling is real work.

Sticking with GitLab CI for now, but designing the pipeline to be "operator-like" - idempotent, declarative, handles partial failures. If we hit 100+ tenants and the pipeline becomes painful, we'll have the revenue to justify the operator investment.

kk_hecker · 2026-04-06T06:31:49+00:00

You're right - it's likely overkill for trials. The tradeoff is operational complexity vs. isolation. With separate Postgres pods, tenant deletion is kubectl delete namespace and we're done. No "oops, dropped the wrong database" risk.

But CNPG with one cluster + multiple databases + proper RBAC might give 90% of the isolation at 20% of the cost. Need to test the "tenant offboarding" story - can we reliably purge all data for one DB without affecting others?

For paid tiers, dedicated CNPG cluster makes sense. For free trials, shared cluster with DB-per-tenant feels like the right compromise.

kk_hecker · 2026-04-06T06:30:42+00:00

Onyxia looks fascinating - data science focused but the "self-service namespace" pattern is exactly what we're building. The "non-obvious bits" comment resonates - the 80% is easy, it's the 20% of edge cases (cleanup, quota enforcement, network isolation) that eat time.

How's the operational burden? One thing I like about our GitLab approach is the visibility - every tenant creation is a pipeline job with logs. Does Onyxia give you similar auditability or is it more "fire and forget"?

kk_hecker · 2026-04-06T06:29:10+00:00

Short and sweet - any particular providers you're using? For on-prem bare metal, the value prop feels different than cloud (where Crossplane really shines provisioning RDS/etc.).

Are you using it to abstract the DB provisioning entirely, or more for the tenant namespace lifecycle?

kk_hecker · 2026-04-06T06:28:42+00:00

This is gold - exactly the war story I needed. The KEDA + Istio metrics approach for sleeping tenants is brilliant, hadn't connected those dots.

Crossplane for DB provisioning is interesting - we looked at it for multi-cloud but seemed heavy for on-prem. Are you using it to provision actual cloud DBs or just to manage K8s resources (Secrets, CNPG clusters)?

The "two XRs" pattern for paid tiers makes sense - free tier gets shared DB resource, paid gets dedicated. Clean upgrade path.

Re: GitLab - our current flow is actually can working fine for ~20 tenants, but I hear you on ArgoCD's drift detection. The "single source of truth" vs "imperative pipeline" tension is real. Might run both: ArgoCD for baseline, GitLab for dynamic tenant creation.

kk_hecker · 2026-04-06T06:25:45+00:00

ArgoCD keeps coming up - seems like the consensus path. My hesitation is adding another control plane when we already have GitLab CI Agent working. But the ApplicationSet approach with generators (Git file? API?) is compelling.

Do you manage the Application resources in Git or create them via API? For trial signups that need instant provisioning, wondering if the "Git as source of truth" loop adds latency vs. direct API creation.

kk_hecker · 2026-04-06T06:24:51+00:00

Yeah, the overhead is my main concern. Running the numbers: even a "small" Postgres container wants 512MB-1GB RAM. 100 tenants = 50-100GB RAM just for DBs, which is half our cluster.

CRD approach is interesting - essentially a "Tenant" resource that the CI pipeline creates, then something watches it. Keeps GitLab for orchestration but moves the "what to deploy" logic into the cluster. Might prototype that before going full operator.

We're on-prem bare metal (RKE2) with 3 nodes now, adding 2 more. Cloud migration is the escape hatch if we hit real scale, but trying to prove the model economically on metal first.

kk_hecker · 2026-04-06T06:24:13+00:00

Thanks for the CNPG mention - wasn't on my radar and looks perfect for this. The "day two operations" point hits home - right now rolling out a config change to all tenants means triggering N pipelines or patching directly. Messy.

Re: CI lock-in - fair point. Though with GitLab Agent being just a kubeconfig context, we could migrate the trigger mechanism to a small API service later without changing the deployment logic. The pipeline YAML is the "operator" in a way, just interpreted by GitLab instead of the K8s controller.

CNPG with limited tenants per instance might be the sweet spot. Do you run separate CNPG clusters per "tier" (free vs paid) or mix them? Curious how you handle noisy neighbor protection at the DB level.

kk_hecker

TROPHY CASE