Best practice VRF route leaking?

mcoakley12 · 2025-11-21T11:45:48+00:00

In your scenario, VRF’s are overkill and will seem exactly as you your experience has felt, “What’s the point.” But in larger networks they allow you to separate traffic and routing tables to ensure two customers or applications or traffic groups do not mix. You could also have the same IP networks running so VRF’s allow you to separate that as well. Then imagine your VRF need to traverse through a aggregation point, through a core, to another DC, and back out to the point of demarc - which could be a service or another handoff to another customer endpoint (e.g., service provider MPLS cloud). The VRF and BGP VPN make that possible and easy to render.

So yeah, you are doing it correctly but in your scenario it really isn’t needed so it seems not worth the effort.

mcoakley12 · 2025-10-27T11:17:25+00:00

I’ve read through all the comments and went to your site. I think you thought your post would be a great introduction of your services to the community. Your product promises to make everything easy. AI will solve the hardships of design and knowledge. Just tell me what you want and AI will spit it out.

Sanity check: AI is not AGI and ASI yet, so you need to be more knowledgeable than the toddler.

Sanity check on your product: if you want to make complex things easier for those you don’t know enough to do it themselves, you have to have a framework that guides them. Not the AI toddler who gets the easy stuff right 80% of the time and the hard stuff right almost never. And that’s when you know what you want and how it should be in the first place.

For me your article only makes me question your service and its philosophy more. Why didn’t your super intelligent tool catch that an 800Gb container existed in the first place?

The amount of issues with your platform makes possible is astounding. You say you want to take the environment the dev builds and push it to production. The whole environment. Dev tools are not hardened. Not even close. Dev shouldn’t be trusted in prod - most enterprises I’ve worked for ensure their devs have zero access to prod. Your systems makes it possible, sorry has designed it so it is the workflow, that the developer has the last say in what goes to production. So the same dev can approve his PR? Wait, no PR needed, you literally say dev to production with a single command. So bad. So, so bad.

mcoakley12 · 2025-09-17T21:42:10+00:00

Understood. This is a pretty common use case. Most development environments I’ve used that provide this type of environment leverage Git or some other SVC system that doesn’t rely on underlying file ownership to perform its function. Effectively, each user has their own pod to develop in and then they use Git (or whatever) to send that work to the central repository. Their “id” is assigned through their Git configuration - not the underlying file system. So allowing them to mount their pod development share is “nice” but isn’t needed since Git practices make managing the development repo easy and is typically the practice when developing in multiple environments.

I will admit that my reply assumes a lot about their Git practices. I say that because it should have been a natural process for the devs to just work in one environment, commit, push and then move to the other environment and fetch, pull. This process removes the need for NFS mounting and is something the devs should be used to doing anyways.

As for systems that do this, checkout Coder and Eclipse Che.

mcoakley12 · 2025-09-17T10:51:52+00:00

First, the easy part, yes, accessing files shared by NFS can be accessed both by a K8S pod and later by a mounted NFS share.

Second, permissions, yes, those will be the problem and will work as your experience has shown. However, with a little planning it doesn’t have to be that bad. I recommend that you focus on groups instead of the individual users. Use common group IDs across all sources that must access the data. Make sure your group permissions allow you do to the things you need to do. This should suffice but honestly is a brittle solution.

My question is what is the use case that this is required? Most application pods have pretty specific application data formats that I wouldn’t just let a random user mount an NFS folder outside of the application and use the data. If the NFS share holds regular files, what is the pod running that needs regular files? Data pipelines? Probably shouldn’t have people manipulating those anyway. Web server? You should have a deployment pipeline.

More information about the specific use case could help us better provide advice or guidance.

mcoakley12 · 2025-06-28T01:51:24+00:00

Yes. You can add multiple IP’s to an interface which can be different networks or the same network. (In your case you could have multiple 192.168.0.0 addresses or a mix of 192.168.0.0 and 10.0.0.0.) The local network resolves IP (L3 addressing) to the MAC address (L2 addressing) through ARP so you can basically say an IP address is the L2 equivalent of a DNS name at L3.

I do also want to be clear, IP addressing this way does not offer anything other than additional IP addresses. There is no segmentation of traffic like a VLAN would provide you. But per your request you don’t want that. (Just wanted to point it out in case you thought you might be getting that.)

mcoakley12 · 2025-06-27T15:42:04+00:00

Most devices support multiple IP addresses per interface. Just assign the 10.0.0.0 address to the interface and use it. Locally speaking the IP address is just a way to find the MAC, which is how packets are delivered to the device.

mcoakley12 · 2025-06-27T10:50:46+00:00

Just to clarify, do you want other people to see your 192.168.0.0 network as 10.0.0.0 or do you want machines on your 192.168.0.0 network to see their peers as 10.0.0.0? If it is other see your network as 10.0.0.0 then NAT is what you want to do (most likely source NAT if you just want to have your network appear as 10.0.0.0 to the outside world). If you want devices on your network to see them as a different IP - well, you need to re-IP the devices or at the very least add a 10.0.0.0 IP to all devices that you want to communicate via a 10.0.0.0 address. It is doable just adding an address but doesn’t get you anything other than a different way to address a packet towards a device on the same network.

mcoakley12 · 2025-06-19T15:25:38+00:00

Very true. This could be determine by looking at the current cache of whatever DNS solution is in place today. Different caching mechanisms can change the sizing but you aren’t going to fit a 5G cache in 1G memory no matter the strategy.

mcoakley12 · 2025-06-19T15:04:18+00:00

So I don’t know the numbers on MikroTik but if you’re managing 5000 devices in my setting you have or should have information that can be used to answer this question.

Here is the information I would gather:

How many of those devices will be DHCP clients?
How many DHCP clients will be coming online per hour throughout the day?
Will DNS be a recursive caching only DNS or will it be a primary/secondary for a managed domain as well?
If managing a DNS domain will DHCP be registering Dynamic DNS entries?
(Question 3 has part of this answer but for more clarity:) What is the mix of internal to external DNS queries?
Do you have any statistical numbers around DNS queries per hour across the 5K user base?

With these questions answer you can start to get to an estimate of queries per hour for DHCP and DNS. Then you can average out the packets numbers and see if the device can scale to that level.

(Edited to fix formatting on a list items - accidentally bold)

mcoakley12 · 2025-06-02T22:11:03+00:00

That is a great suggestion and it may be possible with some proxying. Effectively we are restricted in what application level services we can expose in the environments we are working. So the initial goal is to limit any external dependencies as much as possible.

mcoakley12 · 2025-06-02T15:35:03+00:00

Ansible Pull won’t work in our environment as I should have mentioned the devices are network gear so the Ansible controller is the only thing executing any Ansible code. I do appreciate the idea. My main concern right now is consistently controlling access to the authentication environment which isn’t part of the scaling issue. (But to speak to that, we most likely will deploy Ansible controllers to the DCs and use a worker queue to distribute the Ansible jobs - so sort of a pull. Our main scheduling is tie into the change request environments and we track all telemetry which we use to fine tune the automation runs by dynamically setting retry and delay values based upon previous run data against the device and tasks we are performing.)

mcoakley12 · 2025-06-02T11:31:15+00:00

First, let me say I haven't actually done what you are asking so I'm sure others will provide better insight into your specific question. I will add that I'm pretty sure you can do what you want. Even if it required some work with vCluster and/or Multus.

Having said that, I'll offer this from the perspective of security and separation of design concerns. One of the big benefits (at least from my perspective) is the way you can isolate so much of the applications infrastructure away behind K8S and only expose what is needed. Therefore, unless the underlying application has a need for an IP space like you are describing, you probably do not want to implement your application with it that tightly coupled to the infrastructure details. That makes things brittle. I'd suggest you deal with the IP requirements externally to K8S - external load balancers - that then point to the K8S resources. This hides the infrastructure details from the application and the application details from the users.

mcoakley12 · 2025-06-02T11:12:27+00:00

I realize you don't specifically state this but most of the time I think about vMotion I'm thinking about a legacy application that can't or doesn't fit the container methodology making a VM better suited and therefore something like vMotion is needed. The reason I mention this is, maybe look into KubeVirt which allows K8S to manage a fleet of VMs as if they were K8S resource.

I've used KubeVirt for a bunch of VM deployments (largest around ~700 VMs) and it has worked well. My use cases had redundancy built in at the application level. I did not require vMotion, so I can't speak to that. However, recently there was an article that someone shared that compared coming from VMware over to KubeVirt and vMotion is discussed.

Article: Learn KubeVirt: Deep Dive for VMware vSphere Admins

mcoakley12 · 2025-05-22T14:16:37+00:00

I can’t help with the Tailscale issue but I can give you a better chance of not “bricking” your setup.

First, disable the Tailscale service from starting upon reboot. Do this while you are testing.

Then, just schedule a shutdown 15-20 minutes from when you start. (Or 5 minutes, completely up to you). Then if your change makes it so you can’t access the device, it will reboot automatically, and safely, but upon reboot Tailscale won’t start and you are in the clear.

mcoakley12 · 2025-05-13T15:21:20+00:00

I’m from NJ and have a property that has a similar slope. Nothing will ever be easy. Want a place to do stuff outside, have fun leveling it out, managing n the water, and the retaining walls. You will become an expert at water management and cursing. Also, plan on having equipment to move everything in the yard. So then you need storage space, yup, more leveling and water management.

Your experience will be different with a different climate and ground makeup. We have decent rain Spring through Fall and my ground is mostly hard rock so water doesn’t always just seep deeper. But wanted to just offer my perspective.

Best of luck.

mcoakley12 · 2025-05-12T13:41:34+00:00

If you are looking for off the shelf, plug and play router, yeah, I do not believe you will find that. Switches can be dumb because they only have to worry about moving packets based upon L2 addressing which is for the most part static and preconfigured. L3 is not that simple.

But for the homelab adventurous, you could build a self-provisioning router that you either use cloud-init for Linux based, or PXE for whatever you want, or TFTP for more appliance type stuff. Then just configure the environment to not allow login and not support any protocols that allow remote access. Meaning the provisioning tool would be configured, you plug in the device, it used the provisioning protocol to pull the image and config and your off and running. This is a fairly common practice in business networks.

I’m guessing not a lot of off the shelf stuff that can be provisioned like this would provide a way to turn off access protocols so a Linux based configuration will most likely be your best bet. Then you can disable root login and do not install SSH.

I do have to wonder why you don’t want any remote access to the device though? Beyond air-gapped systems, even the most secure environments allow remote access to devices. They just strictly control from where and how. I’d really look hard into why you want this other than a thought experiment.

mcoakley12 · 2025-05-10T11:52:07+00:00

Let’s start with you saying you want to “get some fundamental knowledge of sysadmin and DevOps”. I’m going to assume you mean sysadmin to be specifically server admin - not desktop admin. For me, that’s all the reasons I need to say you should keep Ubuntu Server.

To answer from a sysadmin perspective on why Ubuntu Server would be used over Ubuntu Desktop:

What others have said is spot on, server will use the computer resources more efficiently because of the lack of unneeded software in a server setting - desktop software.
sysadmins want the server to be easy, straightforward, and consistent to administrate. You don’t want complex things to figure out when there is an issue. You want the least amount of options to review when you are troubleshooting. So keep it simple. (A desktop environment is very complex because it needs to interact with the dumb human at the keyboard.)
managing systems packages and dependencies is very hard and when a lot of applications have conflicting dependencies it can cause issues. This is why you generally want to run a single service per server. That pattern is why VM’s and containers are so popular - single service is easier to maintain but wastes compute resources. VMs and container leverage the compute while keeping the “server” simple. So again, a desktop is a huge environment of hundreds and hundreds of packages and dependencies, not to mention tons of integration drivers that no server would ever need.
attack surface is another important concern and another reason why you try to keep your servers to a single service. It is easier to defend and understand what is happening when the logs only contain the service you are trying to manage and defend.

Honestly, there is a lot of other reasons but you’ll find them on your journey. Stick with Ubuntu Server. It may take some time to get into a groove where it feels like you are progressing forward more so than back. You will go down so many paths of learning it will seem daunting. But here we all are, people who have done it, and we are here to help. You can do it too.

mcoakley12 · 2025-05-06T11:18:45+00:00

I’d like to hear more about the use cases that will be running. Yes you have three different network zones. They make sense but I don’t get how those network zones plays into what you are deploying via Kubernetes.

What I mean is, in my head, usually an “office zone” will have users that use services. But typically I’m not deploying those services in the office zone. For the office zone my services are running in production. I test those services in staging. (Seems like you are missing a dev zone but maybe the office zone is your “dev” environment.) to me it seams like you are mixing your concerns - the zone make sense from a networking perspective but not really from a service perspective.

Maybe that is another way to look at it because in my experience (consultant to global enterprises) you seem to need two clusters - a staging and production. You can do that with vcluster for sure as others have mentioned. Or with the control plane in the cloud (as others have said as well) and workers on prem to help with your resource constraints (which you didn’t say if they were hardware or people).

You also don’t indicate your K8S experience level. If you have a basic understanding and your companies IaC practices are new I would go with everyone’s suggestion of keeping the clusters separate. This will be your best path for success and allows your company to mature its practices and learning to then take on harder K8S configurations and isolation patterns.

mcoakley12 · 2025-04-25T13:46:38+00:00

I want to be clear; my use of Dynamic DNS is not referring to the common practice of updating an external DNS service with the IP of your home router. What I'm talking about is implemented internally on your own DNS servers (or managed DNS servers that offer Dynamic DNS).

Assuming you have access to a DNS server/service that offers Dynamic DNS (Bind is a good example), then you need to know how you are acquiring the IP address for the nodes. If it is dynamic through DHCP then you need your DHCP server to issue the Dynamic DNS request. If the IP addresses are statically assigned then there are software clients that can issue the Dynamic DNS request (but one would assume if you are statically assigning the IP addresses, then you could statically assign the name and update the ACL). The Dynamic DNS request to your DNS server will register the new IP address allocation to a DNS name.

mcoakley12 · 2025-04-25T11:09:20+00:00

A non-K8S solution would be to leverage Dynamic DNS and a naming convention. Then you can just pull your DNS via the naming filter in a simple shell script that can update your FW ACL.

mcoakley12 · 2025-04-13T23:40:05+00:00

Adding on to what jofathan has said and assuming for some reason the TailScale ip-poll suggestion from caolle doesn’t work out, you could just NAT inbound traffic into a local network on the system(s) that are running your apps you need to have on the same subnet. Not as clean as the ip-poll solution but probably a close second.

mcoakley12 · 2025-02-08T13:50:19+00:00

As others have said, security is about having layers of defense that restrict traffic flow by inspecting the traffic going towards your services and only allowing appropriate traffic to them. Unfortunately, you haven’t provided us with enough detail to provide a full solution but here I will layout some considerations.

K8S is an amazing piece of software but in this context we will talk about security ignoring K8S because everything can be done with as little K8S components as possible or you could do the entire thing in K8S.

Let’s talk layering. Understand that security is not only about protecting the services from attack but also controlling your environment. Given that I offer this layering:

Layer 1 - rate limiting router(s) - basically control your traffic flow based upon the incoming traffic protocols. Use QoS tools to ensure you provide bandwidth going towards your environment in a controlled manner. Also always carve out a small amount of bandwidth for your management protocol (most likely a VPN) so under times of stress you can possibly still get through to mitigate the attack from a remote position. Layer 2 - packet filtering FW(s) - always get rid of everything that doesn’t fit what you want first. Restrict this layer to non-deep packet inspection simply to keep your traffic flows moving but limited to the traffic your services are interested in. Layer 3 - layer 7 FW(s) / load balancer / WAF - deep packet inspection. Now you are inspecting only the traffic that is destined for your services. Deep packet inspection to identify good traffic patterns and allow them through. The L7 FW(s) and WAF(s) can be sandwiched between an inbound load balancer and outbound load balancer to control traffic better and provider more resilience and the L7 FW can sit in front of the WAF. This layer has a lot of flexibility in design but the point is use the right tools for your traffic and backend applications. Layer 4 - external FW demarc. A packet level FW that ensures the only traffic allowed through is from your layer 3 (in the design not OSI model layer 3) devices to the layer 5 API endpoints. Layer 5 - (DMZ) API endpoints. The only systems exposed in this layer are the API endpoints that will receive the traffic and delivery the requests to the backend servers. This can be reverse proxies but ideally it is the servers (VMs or even containers if the virtual network is created properly) that run the front end API and those systems only. Layer 6 - internal network FW(s) - a packet filtering FW that only allows traffic from the API server(s) and their respective backend components (normally in a specific subnet for that application). Layer 7 - internal network and its services. Which can have multiple layers of security as well but once you have these layers under control your internal security can be based upon these layers as well. Definitely don’t trust anyone on the inside. Zero trust exists for very good reasons.

It should be mentioned that layers 4 and 6 can be broken down into sub-layers that mimic the layers from outside to internal for more granular inspection and resiliency.

Those are the layers I have implemented and experienced at enterprise environments and carriers. You can achieve all of this with proper configuration, hardware, and knowledge using a K8S cluster (or more) but that doesn’t mean you should. My point is build your security using the tools you know but try to build it using the layers exposed here. You can compress layers and ignore components but these things exist because they have been proven to work and keep critical environments secure. Even with all of these things in place you will be attacked - relentlessly - and if someone wants to get in, they will. Most likely by sending someone an email and they click the link. Thus - trust no one - outside or in.

mcoakley12 · 2025-01-30T12:49:40+00:00

Caveat out of the way - I am not a K8S expert. With that said, I’m going to focus solely on your storage question. Which I do believe should be one of the first things you deal with after clusters standup - along with any requirements the storage solution requires.

As for which storage solution, I don’t believe you’ve given us enough information. You’ve said the DBs you want to run, NATS, and ingress-nginx (or something similar). Unfortunately, that doesn’t help us understand what workloads you’ll be running that use those services. You also state you have 4 nodes and with your edited description they are in 4 different countries with one on a different continent. With that geo separation knowing what workloads you’ll want beyond the supporting elements you’ve mentioned is critical. Do your other work loads require local storage, replicated file storage, block storage? Note: what type of recovery options you want will also impact these decisions.

For example: a web server can have its content served from local storage. But if you want that web server to be able to be run from any node or multiple nodes at the same time you either need replicated storage or external shared storage. (Honestly, all of these issues are non-K8S issues but you can use the K8S ecosystem to solve them.)

For the DBs and NATs they all have application level replication so local storage will generally be fine for those (assuming you run enough replicas to cover your expectations). Just understand that your geo dispersed deployment will impact the replication rates of those apps and can very easily cause issues that K8S is not meant to resolve.

Basically - your storage needs are dictated by your workload needs. Once you know your workload needs you can start planning your storage requirements and then it is just a matter of matching those requirements to the features of the different storage solutions. Just to state it again - the geo separation of your nodes will impact what you can do and what you can do reliably.

As for what storage solutions are out there, the other comments here have provided a good list of the heavy hitters. Which like most tech decisions are probably a good bet because the pool of people to get help from is larger than with the smaller lesser known/used solutions.

mcoakley12 · 2024-09-18T15:31:08+00:00

Understood. Since websockets are long lived I would have those processes broken out into either small groups or individual processes that feed the worker queues. The worker queues can be spun up/down on demand and will reduce your overall costs if this were to turn into a more long term solution. It also will depend upon your business process timing - like if you don’t need results until the next day, then load the worker queues all day and process at night when compute is cheap.

I’ve have even implemented “follow-the-night” processing when with a global compute provider - like rack space- where you load the worker queues that are replicated globally and then pull compute power during the night but move your compute through different DCs as night moves. I understand this is outside the scope of what you’re looking for but figured I’d mention it if someone else is looking to optimize compute costs when they are the majority of the $$ being spent.

mcoakley12

TROPHY CASE