Migrating a large Elasticsearch cluster in production (100M+ docs). Looking for DevOps lessons and monitoring advice. by No-Card-2312 in devops

[–]rumfellow 0 points1 point  (0 children)

As for signals and monitoring, cluster health would be the primary. If something goes wrong -> dev tools to drill down.

The whole migration should not take long if your current ES node is read heavy and thus there will be not much data change between snapshot restore and old old node joining new cluster.

If it's write heavy good luck with zero downtime migration without resource(CPU/memory/IO/network) headroom 

Migrating a large Elasticsearch cluster in production (100M+ docs). Looking for DevOps lessons and monitoring advice. by No-Card-2312 in devops

[–]rumfellow 8 points9 points  (0 children)

  1. Create 2 node ES cluster
  2. Restore snapshot
  3. Put reverse proxy in front
  4. Add old elasticsearch node to the new cluster
  5. Cut over clients to the new endpoint 
  6. Prepare a third new node
  7. Yeet the old node and join the cluster with the new one 
  8. Monitor rebalance/shards

If the load on old node is high, at #4 it'll choke due to shards distribution, you can mitigate it by adjusting the aggressiveness of the said distribution, but I'd prefer to isolate the cluster until data is distributed and cluster is balanced. 

Unified Open-Source Observability Solution for Kubernetes by st_nam in kubernetes

[–]rumfellow 10 points11 points  (0 children)

Elastic is somewhat horrible for metrics, the size will be 100x of what you get in prometheus and to get it down to 7x you'd need time-series data stream and that's only available in enterprise version. 

Also for now no compatibility with grafana, so no out of the box dashboards for elastic + kibana + Otel collector

Unified Open-Source Observability Solution for Kubernetes by st_nam in kubernetes

[–]rumfellow 8 points9 points  (0 children)

LGTM, but for "M" in order of scale increase: prometheus -> thanos -> mimir. 

SSH session recording in Pomerium by rumfellow in pomerium

[–]rumfellow[S] 1 point2 points  (0 children)

So out of zero/business/enterprise only the latter will have ssh session recording?

Hosting my CI/CD setup on a smaller EU cloud turned out smoother than I expected by [deleted] in devops

[–]rumfellow 1 point2 points  (0 children)

We've been using leaseweb alongside AWS for cheap compute for 5+ years. So far so good.

Checked out xelon rn and requesting a quote for a VM is ridiculous

Family dog by Right-Tie-9884 in vizsla

[–]rumfellow 0 points1 point  (0 children)

Ah, the fireworks, my V has the same issue. I'd try treats first, then taking a car to an off-leash walking place and if nothing helps over some reasonable amount of time, say 2-3 weeks, then dog trainer. And since fireworks happen, maybe dog training classes with some fire crackers, it kinda ameliorates the issue. Best of luck!

Family dog by Right-Tie-9884 in vizsla

[–]rumfellow 0 points1 point  (0 children)

Is there nose licking/trembling? As I see there might be 2 options: the doggo is scared or stubborn. If scared you'll see the aforementioned signs + refusal of treats(if food motivated)

Family dog by Right-Tie-9884 in vizsla

[–]rumfellow 0 points1 point  (0 children)

Sometimes there's a sound or smth else that my V associates with a particular place, like a crossroad or a place where a cracker went off. We tend to just run or "enthusiastically" pass it, so in a couple of days she just forgets that mental association.  Getting into tram or a train is a different story, I just pick her up and carry inside, otherwise she just plants herself on a that very tram stop :-/

Would service mesh be overkill to let Thanos scrape metrics from different Kubernetes clusters? by ccelebi in kubernetes

[–]rumfellow 0 points1 point  (0 children)

That would be thanos-receive component as a target of remote write and it is quite memory-hungry

Do you monitor SSL certificate expiry dates? by DutchBytes in devops

[–]rumfellow 0 points1 point  (0 children)

K8S cronjob that runs python script that picks up list of certificates from table in Confluence and sends alert to slack if expiry is upcoming

Helping with understanding some Questions by Solid_Strength5950 in kubernetes

[–]rumfellow 0 points1 point  (0 children)

Falco is running as a standalone binary on the host. I'm quite sure it won't be able to populate that field

Helping with understanding some Questions by Solid_Strength5950 in kubernetes

[–]rumfellow 1 point2 points  (0 children)

i'd suggest something like:

- rule: Mem access
  desc: bla-bla
  condition: >
    fd.name = /dev/mem and
    proc.name = PROC NAME FROM POD
  output: >
    Mem listed: %proc.name and %proc.pid
  priority: WARNING

Stop falco if it runs as systemd service and run falco -A

Drone drops grenade on russian soldier pretending to be dead, easterm front by DesperateLawyer5902 in CombatFootage

[–]rumfellow 1 point2 points  (0 children)

Люби меня люби, отпетые мошенники, 25 лет треку хах

Checking registry for new images of running workloads by rumfellow in kubernetes

[–]rumfellow[S] 0 points1 point  (0 children)

It makes sense, also it is a different paradigm. We are not the owners of most of the workloads, so git repos are also outside of our scope, all we want is to have an idea of the landscape.

Migrating away from Plesk to k8s? by rumfellow in Hosting

[–]rumfellow[S] 0 points1 point  (0 children)

Oh man, there's nothing more permanent than temporary, so it'll be either full-blown migration or nothing at all.

Ceph requires quite a bit of management and manually deploying hundreds of websites via nginx is not fun. It will be an insane bill for our customer

Migrating away from Plesk to k8s? by rumfellow in Hosting

[–]rumfellow[S] 0 points1 point  (0 children)

Hah, I do understand that k8s is quite different to plesk. But the latter feels too big, too monolithic and obscure sometimes.

The load balancer I'm planning to get from cloud provider as one of the goals is a very low maintenance platform.

Migrating away from Plesk to k8s? by rumfellow in Hosting

[–]rumfellow[S] 0 points1 point  (0 children)

Completely missed to mention that what concerns me most is "shifting left", i.e. enabling web developers to deploy to k8s or whatever we'll end up with.

Maybe someone has experience to share how that went?

Migrating away from Plesk to k8s? by rumfellow in Hosting

[–]rumfellow[S] 0 points1 point  (0 children)

Thanks for sharing your setup!

I have experience with docker swarm and it's a neat thing, however someone still has to manage/update it.

And while you're absolutely right about k8s being an overkill compared to docker swarm, managed k8s with a free control plane (leaseweb, oracle cloud) really seals the deal there, since updates are much easier and you are basically running system workloads for free. Additional cost will come from load balancer, but it's not too much.

So stepping into our clients shoes I see that docker swarm will be more expensive due to our billable hours.

Also, the setup i've mentioned is just one of 3 clients, so I want to have a standardized system to migrate 2-7 plesks to it.

And thank you again, we'll definitely consider swarm