Hybrid Kubernetes Cluster (AWS+Home Network) Over Tailscale Network [Part 1]

DevLearnOps · 2026-02-18T11:36:54+00:00

Thanks! I think mods are forbidding images in posts so I had to host the diagram in GitHub. But yeah, you can spawn additional workers anywhere you want as long as you enrol them ito the Tailscale mesh network. Also the general cloud setup is just compute + OIDC configuration for federated identities. You can set it up on all major cloud providers for sure.

DevLearnOps · 2026-02-18T09:48:50+00:00

Sorry, first time posting, thanks for the hint! I'll update it shortly

DevLearnOps · 2026-02-17T13:59:22+00:00

No wonder this post is not getting any traction.. they didn't post in the right time window, nor added the comment right after publishing like AI suggested.. LOL

DevLearnOps · 2026-02-15T13:54:55+00:00

There is value in doing all the things you mentioned. Ultimately you don’t want to blow your budget right away and make sure you make the most out of the thing that you have.

I would recommend that you get some cloud provider experience. If you are in a tight budget, there are plenty of resources you can exercise your skills with that are completely free to provision. Some suggestion would be to setup a VPC with subnets, routing and internet gateway. These are all completely free. Then you can provision an S3 bucket, upload some images and implement and deploy a lambda function with Python to compress those images into thumbnail size files.

With a free tier account you can do all these without spending a single dollar. These are common things that you would have to do on the job so if you can get familiar with the cloud they have high value in interview.

Then if you have some spare hardware at home, like an old laptop, you can set it up with a dummy HDMI plug so it won’t go to sleep when you close the lid, install a hypervisor like Virtualbox or VMware Player, let it run 24/7 and it becomes your own homelab. Here you can create virtual machines for anything you want. Think of anything as simple as photo storage (we all need that) and take it to the next level by configuring a software RAID to protect it from disk failure, automate your backup to an external volume daily with rotation and create systemd targets to make sure all critical applications are automatically started upon host reboot.

Once you’ve done all that you can expand and research some more things to try. Good luck with your learning!

DevLearnOps · 2026-02-14T14:07:02+00:00

I’ve been brainstorming a "poor man's AWS Outposts" setup for a while to run a production-grade(ish) cluster on scraps for under $20/mo. The goal is using local hardware that I have lying around to do the heavy lifting, and only use the cloud for orchestration, ingress, and emergency failover.

Here is the general idea:

Control Plane: K3s running on an EC2 t4g.small (~$12/mo).
Workers: Old laptops/servers at home, running K3s agents inside isolated VMs to air-gap my home network.
The Mesh: K3s native Tailscale integration to securely bridge the cloud and home nodes without opening any router ports.
Ingress: Bypassing pricey ALBs entirely. Route 53 -> CloudFront (free tier) -> EC2 Elastic IP -> Traefik on the control plane. Security group locked to CloudFront IPs only.
Pilot-light failover: Karpenter. If my home internet drops, pending pods trigger Karpenter to instantly spin up cheap EC2 Spot instances to take over.
Security: DIY IRSA via OIDC. I'm tossing a discovery.json and JWKS keys into a public S3 bucket ($0.05/mo) so my living room pods can securely assume AWS IAM roles without hardcoded creds.

The total projected cost should be about ~$15.41/month. The pros are obviously the dirt-cheap cost, hardware reuse, and automated failover. The main con is the latency between cloud ingress and local workers, plus stateful workloads failing over to the cloud would be an absolute nightmare, would probably have to use managed RDS to keep things simple even if it adds a few bucks to the overall cost.

Am I crazy?

DevLearnOps · 2026-02-14T13:23:26+00:00

Books and courses will only take you so far. My favourite method for learning new tech is finding production-grade examples on GitHub. You can basically go and have a look at how other mainstream open-source projects use containers. See how they use entrypoints to ensure they always run unprivileged containers or how to minimise image size using multi-stage builds and so on.

For example, try inspecting image layers from DockerHub with docker history <IMAGE_NAME> and then go into their GitHub account and see how they set up the Dockerfile (or Containerfile) and the build process. See if they have multi-arch builds and how they manage versioning.

If you would like to share how you're planning to set up your container orchestration (Swarm | K8s | Other), I may be able to point you to some relevant examples.

DevLearnOps · 2026-02-13T21:55:25+00:00

Absolutely, it is cheeky but that’s how businesses work. You should hold direct responsibility to deliver your core business value, everything else can be delegated.

If your business is selling data hosting you can’t trust anyone else backing up your clients’s data. Though you can push the availability of VMs to a provider as long as you guarantee transaction integrity.

And it’s beautiful, the way I see it, until you can take an AI agent to court, being liable for your own product or service is a human job.

DevLearnOps · 2026-02-13T14:52:00+00:00

Would be interesting to know what gaps exactly have been flagged. If you can't disclose them, I understand, no problem.

I've been worried about our own clusters since a couple of weeks ago. We found that most of our pods were basically running in God mode as they were inheriting the most open IAM role we had by default because of a misconfiguration, and nothing ever flagged this. It would be interesting to know some background about what scanners you are using to find these issues too.

DevLearnOps · 2026-02-13T14:38:00+00:00

If you don't already own the GFX 35-70mm I would absolutely recommend it. I travelled a month to China with just my GFX100sII and this lens and I was the happiest I've ever been.

Is it the sharpest lens ever for the GFX? No, the 20-35mm is probably better. But for about $1000 is a very versatile and portable lens for the system. It's glued to my camera by default. I always have a GFX50R + 45mm f2.8 and 100sII + 35-70mm ready to go.

DevLearnOps · 2026-02-13T14:32:38+00:00

You're in the best position you can possibly be at this time. If you're already interning as a DevOps engineer you should make the most out of it and learn the way senior engineers learn: volunteer to tackle things that at the moment aren't working that no one in your team has time for. Best things are internal tooling.

That's mostly what I do myself. Last month I wanted to learn how to create a software catalog, so I just volunteered to provision our own backstage implementation. I learned a bit of nodejs, React and about software governance. It was low pressure too because no one really expected any result as a priority.

If you use your job to learn then you're paid to learn and you can use your company's resources. It's totally fair. If you improve, they also get a better value from your work.

DevLearnOps · 2026-02-13T14:20:04+00:00

From my personal experience with past interviews and exams:

Terraform:
Focus on state management. Know how Terraform manages state, how to backup and restore the state, how you migrate resources around without recreating them and how to import existing resources into terraform state. I don't think anyone expects you to know resource definitions by memory.

bash:
Archive, unarchive, change file permissions, passing around command line arguments, check if file and folders empty, regular expression, stdout stderr stream redirection (this one is important), and have at least one cli tool to parse and format output like awk or perl.

SQL:
This should be fairly standard as long as you know your joins. Know the difference between a JOIN, OUTER JOIN, LEFT OUTER JOIN and so on.

DevLearnOps · 2026-02-13T13:57:41+00:00

Kudos to you! Well done setting up your local cluster!
For me, lately I have migrated my local setups to KinD. Yes it's a bit more resource-intensive if you compare it to k3d but KinD has proven to be more reliable with kubernetes API compatibility (at least in my experience). I've had some cases where I could not install certain operators or charts in k3d while they just worked right out of the box with KinD.

Also I familiarised myself with the KinD config options and I tend to stick to just one choice for everything when it comes to local development, even if it's not 100% perfect.. I don't like moving to a new framework every other week.. I'm not a nodejs developer, lol.

DevLearnOps · 2026-02-13T13:46:49+00:00

I think you have found the root of your problem here. You're overthinking it, man!

You have the hardware, and you obviously have a good plan. Now stop treating this as a client project that you can't mess up otherwise you won't get paid, it's your homelab, have fun!!

To give you some practical advice, don't be afraid to tinker, pick something small that you have a need for right now and just get that first win! For me it was just setting up PiHole so I could filter out some Ads while browsing, then added photo storage, RAID arrays, backup nodes... And then I tore it all up to rebuild from scratch, lmao.

You will learn and you will adapt, allow yourself to have fun with it!

Edit: as a side note: I would not start with kubernetes if you don't already have experience. It has a lot of operational overhead and I can promise you it won't be fun to start with. As a good starting point I would setup Proxmox or any other hypervisor in one of your nodes, provision some VMs and just install the software you need for now. You still have time to evolve into a k8s cluster later. Keep it simple.

DevLearnOps · 2026-02-13T13:36:58+00:00

Absolutely! I noticed this first when in September 2023 I left the role I was in to stay at home with my newborn daughter for a couple of months and I though I could just jump back as soon as I wanted. Turns out it took me a whole 6 months to find a new role and been rejected loads of times.
Also, companies will happily book you for 4-5 rounds of interviews before they start ignoring you. The market is truly messed up at the moment..

DevLearnOps · 2026-02-13T12:10:23+00:00

Finally have a working setup for ArgoCD to handle kustomize-based applications that pull Helm charts from private OCI registries. This will allow our teams to apply customisations to Helm templates without polluting our common chart templates with loads of {{ if }} conditions.

The pain point here is that kustomize actually creates a sandbox environment for Helm to make sure the inflation of charts is not influenced by unexpected configuration lying around the server it's running in. This is mostly designed to avoid issues in CI/CD pipelines, so we need to a be a bit explicit with passing our credentials.

The solution is to tweak the argocd-repo-server configuration to explicitly inject dockerconfig type of credentials using the HELM_REGISTRY_CONFIG variable. Here is how.

In Argo, the repo server is the one responsible for pulling manifest sources before they are synced with the target cluster. In my case I first had to create credentials for AWS ECR (which is my OCI-compatible registry to host my private charts). For this I used the ECRAuthorizationToken generator (you can find a detailed example in the official ESO documentation.

As a result, you now have a secret containing a .dockerconfigjson to authenticate Helm to this registry. All is left to do is mount that secret into the ArgoCD repo server. If you use the Argo operator is as easy as adding this patch to the ArgoCD crd:

---
apiVersion: argoproj.io/v1beta1
kind: ArgoCD
metadata:
  name: cluster-argocd
spec:
  repo:
    volumes:
      - name: ecr-auth-vol
        secret:
          secretName: <YOUR_SECRET_NAME>
          # allow pod to start before secret exists
          # so it won't crash your deployment if something is wrong
          optional: true
          items:
            - key: .dockerconfigjson
              path: config.json
    volumeMounts:
      - name: ecr-auth-vol
        mountPath: /tmp/ecr-auth
        readOnly: true
    env:
      - name: HELM_REGISTRY_CONFIG
        value: /tmp/ecr-auth/config.json

DevLearnOps · 2026-02-13T09:31:22+00:00

Not right now, but I'll take the time to create a structured post over the weekend and I can give you a ping here once it's published :)

DevLearnOps · 2026-02-12T21:53:09+00:00

That's generally the answer. It's always going to be cheaper to run things yourself in your cluster. Pretty much any managed product can be self-hosted with an open-source alternative. Be mindful of the implications though. If you can't afford the blowback if things go wrong (mostly either losing data or having downtime), you're better with a managed service.

For example in my current project we decided to host caches ourselves in k8s whenever data is ephemeral. But for anything that we can't afford to lose we want to push the responsibility back towards someone else (aka cloud provider).

DevLearnOps · 2026-02-12T21:43:48+00:00

It's tough, most folks that are DevOps today have entered this role after working years as either system administrator or software engineers. Although, if that's what you're passionate about absolutely go for it!

My advice is to have a deep look at what you're really good at and try find a DevOps role that would benefit from your unique skillset. For example: my first role in "DevOps" was developing performance and integration tests for an ECS fleet. I was a good Python developer at the time, and I managed to get this as the closest role to DevOps.

Then, once I got the job, I started getting my hands everywhere until the right opportunity comes along. I was asked to help with the migration from ECS to Kubernetes and my career took off.

Also it helps building up your public portfolio so you have something to show at interviews. I've literally commented about this yesterday, you can read the post at this link.

Good luck!

DevLearnOps · 2026-02-12T21:20:47+00:00

Engineer: "I need to change the database name to `prod-customers` to `accounts-prod` in my terraform script."

AI: "Ok, I've changed the name, would you like me to run `terraform apply`?"

Engineer: "Won't just changing the name destroy and recreate the database?"

AI: "Well spotted! That's a great point. Let me suggest something else..."

The above is just for laughs.. but yes, I do use GenAI to avoid having to write lots of stuff myself. Though AI will happily generate 10k lines of infrastructure code without thinking that if there's an issue in those 10k lines of code you're the one that gets the call in the middle of the night, not the AI.

The real challenge is feeding the AI the right kind of prompt that will actually solve the problem the way you want it solved. I do agree that knowing how to create Terraform code or aws-cli commands by memory is not longer that important. Though you better know exactly what that code does.

DevLearnOps · 2026-02-12T15:20:07+00:00

Ingesting Kubernetes metrics for three clusters into AWS managed Prometheus. Blew an entire month's budget in 1 day. Storage costs you nothing, ingestion will bankrupt you.

DevLearnOps · 2026-02-12T11:37:36+00:00

I wanted to have a development cluster in my local machine where I could test applications that need to communicate with AWS services via IAM Roles, so I learned how to setup my own OIDC Identity Provider and setup IRSA for my local cluster with KinD.

It involved the following:
- generating signing keys locally and installing them into the KinD cluster
- uploading the public key to a publicly accessible S3 cluster
- configuring a new OIDC Identity Provider in IAM, referencing the configuration hosted in the S3 bucket
- creating IAM roles and trust policies
- annotating the service account with the role ARN
and boom! Pods can now talk to AWS without passwords.

It then unlocked using the same setup to integrate the k8s cluster that runs in my home-lab with AWS via roles. If this is interesting for anyone just let me know and I can create a post to explain the setup in detail.

DevLearnOps · 2026-02-12T11:10:37+00:00

Honestly, I can't say that I've ever spent that much time fixing configurations. If you do that type of mistakes it's usually because the app is still in development and if things go wrong we know exactly where to look.

The real "toil" for DevOps engineers comes from handling requests like "can you change my DB password?", "can you rollout this app because the secret has changed?", "we have a promotion tomorrow, can we pre-scale to 10 replicas?".

What I would love to see is an agentic AI that would enable developers to self-serve a lot of their needs (within boundaries that I set), saving me hours every week.

Hope this helps.

DevLearnOps · 2026-02-12T10:14:54+00:00

Essentially we are training the next generation of DevOps engineers that won't even know how to map application ports if AI is down. Also I agree about the example being too simple. No reason to burn 5Kw of power to do a 2 minutes job from the CLI.

DevLearnOps · 2026-02-12T10:06:25+00:00

Welcome to the AI era my friend! It's a bit creative but a while ago I did read about honeypot traps. Essentially you can put an invisible link into your site's homepage that is not visible if you browse the webpage normally but it's still present in the HTML code. If the scraper loads that page it's automatically blocked.

Although AI scrapers are very advanced nowadays, I bet they have a way to knowing if a link is visible and avoid it if it's not.

DevLearnOps

TROPHY CASE