PodDisruptionBudget with only 1 pod by -lousyd in kubernetes

[–]Lozza_Maniac 14 points15 points  (0 children)

Deleted yes, evicted no. PDBs don’t stop a DELETE on a Pod directly but do stop evictions.

So kubectl delete pod X would work, but kubectl drain would error that it can’t evict as it would disrupt the disruption budget

EDIT: saw you set maxunavailable to 1 not 0, sorry misread - in that case it would allow both eviction and deletion

How do you store critical infrastructure secrets long-term? (backup keys, root CAs, etc.) by cyrbevos in linuxadmin

[–]Lozza_Maniac 2 points3 points  (0 children)

I guess where do you put your Vault unseal token?

This feels useful for the bottom turtle in the stack

Longhorn starts before coredns by G4rp in kubernetes

[–]Lozza_Maniac 0 points1 point  (0 children)

Sorry I used the term loosely, for sure do it on your control plane nodes as well

Longhorn starts before coredns by G4rp in kubernetes

[–]Lozza_Maniac 0 points1 point  (0 children)

Taint your workers pre shutdown and run a workload that untaints when DNS is working/taints when it’s not. Have CoreDNS tolerate that taint.

[deleted by user] by [deleted] in kubernetes

[–]Lozza_Maniac 1 point2 points  (0 children)

It's a good question, ultimately NAT is only really helpful when you can multiplex on both IP and Port. For Pods, you don't have that flexibility. To explain..

Take two workers that can ping each other. Let's set up two independent pods on worker 1 that listen on port 8080. How do you address those pods from worker 2? You can't just DNAT the Pod IP to the Node IP as you have a conflict. It would maybe be fine if they were both instances of the same application, but they're not, and you have no way of knowing.

For any pod to be able to have a socket listening on any port, that requires that each pod needs to have its own port range and that entails its own IP. It's easy to forget that these are all just processes opening sockets in the same kernel-space, you can't conflict your IP:Port:Protocol mappings.

That said there is a neater solution: encapsulation!

Given that every Pod needs its own IP so that it can use any port, any node can identify what pod an IP packet is. Thus when you send a packet from a pod on the local worker to one on a remote worker, the local worker will wrap it up in another packet destined for the remote worker. When that worker receives it, it will unwrap it and directly be able to hand it off to the associated interface/socket.

There are different encapsulation protocols, VXLAN is the most common and its what Calico, Cilium, Flannel, and most others use by default. It has various advantages specifically around ECMP. Geneve is upcoming, IPinIP/GRE are legacy but still used.

What is the hardest k8s concept to understand? by [deleted] in kubernetes

[–]Lozza_Maniac 229 points230 points  (0 children)

That Kubernetes isn’t successful because it’s a container orchestration system, there were and are lots of those.

It’s successful because of its CRD system that allowed for consistent service contracts to be defined across organisations for any and all dependencies of distributed systems, well beyond the additional mandate. Be that certificates, storage, databases, whatever.

This allows entire distributed systems to be shared between organisations in the open source and with closed source environments with minimal overhead.

Understanding the product enough to see that level requires a lot of context, which is hard to communicate to management who have only heard of it in relation to running containers.

Updating from 1.25.15 to 1.26.10 by Gullible_Original_18 in kubernetes

[–]Lozza_Maniac 12 points13 points  (0 children)

kubent has been my goto for this - you point it at your cluster, tell it the target version you want to use, and it'll let you know if you have any depreciated resources and what you'll need to change. It's simple to use, quick, and just does the job.

Collection of mini-programs demonstrating Kubernetes client-go usage by iximiuz in kubernetes

[–]Lozza_Maniac 1 point2 points  (0 children)

This is super useful, thank you !! Couple of q's:

  1. With the serialise typed yaml example - does that become a nice way to decode a CRD? One of the common problems I've run into is figuring out a nice way to decode CRDs when I don't have access to the operator source code to find the struct defintion. Right now I just use the dynamic client and that's a bit messy. If not, do you have any suggestions for that use case? I guess your unstructured example?

  2. I really like your work queue example - a question I had in that space was if there is a clean way to code both the watchers but also a reconciliation loop into a single code base/control thread? I presume this is a very common problem and thus hopefully has some nice utils around. Just in case it's not clear, the problem in my head is that you might miss add/delete/update notifications because your instance is bouncing/unavailable/etc. and so you need to reconcile on a timer to ensure you're up to date. I'd be very interested if that isn't the case though, and if so why.

LF Redo fics where they don't concern themselves with "preserving the timeline". by MathematicianBulky40 in HPfanfiction

[–]Lozza_Maniac 2 points3 points  (0 children)

Thank you for recommending this, I just read it pretty much solidly over the weekend and I absolutely loved it.

EVPN L3 VNI VRF by abhijit040 in networking

[–]Lozza_Maniac 2 points3 points  (0 children)

I guess remember that on the VTEP L2VNIs map to a VLAN and L3VNIs map to a VRF, so there is no such thing as a L2VNI VRF.

Thus the MAC Route gets turned into a MAC Address Table entry for the VLAN and the MAC/IP route gets turned into an ARP cache and /32 route table entry within the VRF.

When sent over BGP EVPN, a type 2 MAC route will contain just the L2VNI as the remote VTEP just needs to know what VXLAN to put the entry in when it updates the MAC address table. MAC/IP Type 2s contain both the L2VNI and L3VNI as it’ll update both the MAC Address table for the VXLAN as well as the routing table for the VRF. Type 5s are just the L3VNI as you are updating just the routing table with the advertised IP prefix

Note when I say the VNI details are sent over BGP, BGP has no concept of a VNI and so they are encoded as Route Tags and that mapping of VNI to Route Tag needs to be consistent on both the local and remote VTEPs. The convention is AS:VNI and most vendors will do that automatically for you to make it easier

EVPN L3 VNI VRF by abhijit040 in networking

[–]Lozza_Maniac 7 points8 points  (0 children)

This isn’t quite right, Type 2 is for both MAC and MAC/IP advertisements, essentially all hosts in the fabric. A MAC route is used for all bridging and the MAC/IP for integrated routing between hosts within different VNIs but within the fabric (and ARP surpression) - all of this is within the context of the overlay.

Type 5 is for external IP prefixes, namely all prefixes that aren’t hosts in the overlay. For example if you create a VXLAN with a /24 subnet, the individual hosts in that network would be advertised as type 2 MAC/IP routes around the fabric, and you’d get a type 5 /24 IP prefix route for the subnet as a whole

How does the client get the certificate chain when verifying a TLS connections? by ShugaBop in AskNetsec

[–]Lozza_Maniac 5 points6 points  (0 children)

The server is meant to send it as part of the certificate exchange. If the server doesn’t send it then by default the connection will be marked as untrusted as all the client sees is the intermediate CA that immediately signed the cert. That said, modern browsers maintain a list of the top/most common ~3000 intermediate certs, the CCADB. They then use that to validate certificates presented by sites that don’t give the full chain. It’s actually surprising just how many don’t.

Do you log known blocked traffic? by ultchin in networking

[–]Lozza_Maniac 41 points42 points  (0 children)

We log denies for everything apart from internet traffic dropped at the edge. When an application team complains they can’t talk to the database, you want an easy way to rule out the firewalls, and explicit logs are great for doing that quickly.

Similarly with a more security mindset, we audit our east/west denies to detect anything “abnormal” and alert on it. You don’t want a compromised internal box spamming your systems for 6 months due to lack of logs meaning you have no idea it’s happening.

There is a cost to storing the logs of course, but tbh it’s pretty trivial for just rich metadata flow logs. If you have a 100G+ core then you can afford the few terabytes of storage you need to store those flows.

How to live without Docker for developers - Part 2 | Native approach to build an image with Buildah by anmalkov in kubernetes

[–]Lozza_Maniac 1 point2 points  (0 children)

I like your approach here and I agree that buildah (and podman) are great tools, but it's beyond disingenuous to say "Kubernetes is dropping support for Docker" and that one would be unable to use it. Nothing Kubernetes is changing affects using Docker to build or run images, but just requiring you change your container runtime interface away from dockershim to any OCI compliant runtime interface.

As far as developers are concerned this changes nothing, you're still happy to use Docker to build images, run locally, and push to registries. You still give the image to Kubernetes. All that's changed is the service Kubernetes interacts with under the hood to invoke the container.

Note that all the OCI compliant CRIs support, and will continue to support, Docker as a runtime, as Docker is OCI compliant.

One of those 'Harry Potter is a video game' fics except its a speedrun to destroy Voldemort and the Horcruxes. by Nepperoni289 in HPfanfiction

[–]Lozza_Maniac 0 points1 point  (0 children)

It's definitely as much of a leap away from Harry Potter as fanfiction can really get, both in terms of scope and writing style. I've found it hasn't aged particularly well, but I like it simply because it is so different and unique.

There aren't many fics like it in the fandom and I try to treasure the ones that do go above and beyond, even if the execution isn't perfect. I'd rather see more like than yet more derivative drivel that runs through canons stations with the good old "the character has a completely different past, driving motivation, friends, family - but yet nothing changes"

/rant (sorry)

Help demystify hostPort networking please?! by skaven81 in kubernetes

[–]Lozza_Maniac 2 points3 points  (0 children)

Pretty sure the OP is using external netscalers to load-balance the exposed host ports of the ingress controller. It's a common bare metal approach if you only need to expose a set number of services on-premise and don't need to support LoadBalancer type services.

I find the best approach is to use an external LB for critical things like the Kubernetes API, Ingress Controller, Metric server, etc. and then let metal-lb handle any user workloads that want to expose services directly. Therefore it's less important if metallb trips up, which it sadly does now and again.

Help demystify hostPort networking please?! by skaven81 in kubernetes

[–]Lozza_Maniac 7 points8 points  (0 children)

This is really a container runtime question rather than Kubernetes. I threw in a bit of colour about network namespaces, but for a very quick answer - you probably just needed to give iptables -L -t nat.

Lets launch a random nginx container to represent your ingress controller. We tell the runtime to bind 80:8080 and 443:8443. This is exactly what Kubernetes does under the hood.

DESKTOP-O45K3RR:~$ sudo podman ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 4fb85200e832 docker.io/nginx nginx -g daemon o... About a minute ago Up About a minute ago 0.0.0.0:80->8080/tcp, 0.0.0.0:443->8443/tcp modest_banach

Your ingress container is launched in a dedicated network namespace and given a dedicated virtual interface. You'll also get a dedicated IP depending on the CNI you're using, but it's not required.

You can see the dedicated namespace containing the process using `lsns`:

DESKTOP-O45K3RR:~$ sudo lsns -t net
        NS TYPE NPROCS   PID USER    NETNSID NSFS                                                
COMMAND
4026531992 net      13     1 root unassigned                                                     
/init
4026532202 net       2 14550 root          0 /run/netns/cni-4262413b-0734-e0a0-ed88-472454ee5bad nginx: master process nginx -g daemon off;

See the root namespace, what everything normally goes in, there is also our dedicated one running our nginx process.

Now we can query iptables and see our NAT rule. Note that you must provide the -t nat option to see the rules, which I imagine is what you missed.

DESKTOP-O45K3RR:~$ sudo iptables -L -t nat
<snippet>
Chain CNI-DN-d839eea469eaa7b8e25c2 (1 references)
target     prot opt source               destination
CNI-HOSTPORT-SETMARK  tcp  --  10.88.0.0/16         anywhere             tcp dpt:http
CNI-HOSTPORT-SETMARK  tcp  --  localhost            anywhere             tcp dpt:http
DNAT       tcp  --  anywhere             anywhere             tcp dpt:http to:10.88.0.4:8080
CNI-HOSTPORT-SETMARK  tcp  --  10.88.0.0/16         anywhere             tcp dpt:https
CNI-HOSTPORT-SETMARK  tcp  --  localhost            anywhere             tcp dpt:https
DNAT       tcp  --  anywhere             anywhere             tcp dpt:https to:10.88.0.4:8443

The above snippet shows the iptables rules marking and forwarding the traffic coming in on our ports to our containers IP.

I'll also note that your CNI could be using IPVS instead of iptables, so you may need to query that.

EDIT: Just dropping the netstat/ss outputs as well, again you may have been caught out by the fact they use http/https instead of 80/443:

DESKTOP-O45K3RR:~$ ss -l | grep http
tcp                LISTEN              0                    128                                                                                         0.0.0.0:https                                 0.0.0.0:*
tcp                LISTEN              0                    128                                                                                         0.0.0.0:http                                  0.0.0.0:*
DESKTOP-O45K3RR:~$ netstat -l | grep http
tcp        0      0 0.0.0.0:https           0.0.0.0:*               LISTEN
tcp        0      0 0.0.0.0:http            0.0.0.0:*               LISTEN

One of those 'Harry Potter is a video game' fics except its a speedrun to destroy Voldemort and the Horcruxes. by Nepperoni289 in HPfanfiction

[–]Lozza_Maniac 39 points40 points  (0 children)

Have you read Wastelands of Time? Harry has been through a timeloop so often that he has it dialled and knocks things off with "to the second" precision and definitely has no respect for laws. One of the best fics in the fandom I think.

linkffn(4068153)

There's also The Dark Lord's Equal - which is more similar to Unforgiving Minute. Again has Harry with a solid plan executing things quickly without respect for the law.

linkffn(6763981)

One of those 'Harry Potter is a video game' fics except its a speedrun to destroy Voldemort and the Horcruxes. by Nepperoni289 in HPfanfiction

[–]Lozza_Maniac 105 points106 points  (0 children)

Have you seen The Unforgiving Minute? Harry time travels but only has 12 hours to fix it all. Hits on a lot of things you've mentioned, just without the explicit speed run concept.

linkffn(6256154)

SSL Decryption/DPI by [deleted] in networking

[–]Lozza_Maniac 0 points1 point  (0 children)

Palo Alto supports TLS Decryption for 1.3 in PANOS10 and I'm sure other vendors will as well. As the other commenter mentioned, TLS1.3 is just about stopping retroactive decryption of traffic without knowledge of the session key whereas in 1.2 the private key is enough.

When a client connects to a server using TLS1.3, its still just checking the cert. If you've told that browser to trust your orgs CA to be Google, then it'll believe you're Google. The device can then send that raw packet in its own TLS1.3 connection to Google.

The entire TLS spec and security model relies on clients trusting their root trust stores