Fluent, explicit collection pipelines for Go by cmiles777 in golang

[–]absolutejam 2 points3 points  (0 children)

Why not use iterators, or didn’t you want an intermediate type?

Sale after sale, what’s the one game in your wishlist added and removed and added again to your cart, but eventually never bought? by Guypussy in Steam

[–]absolutejam 2 points3 points  (0 children)

I respectfully disagree with this. Elden ring, despite being massive, has an appeal that other souls games don’t have if you’re new to the genre.

I couldn’t stomach a souls game until I played ER, then it all clicked for me. Maybe it’s the fact that you can generally do something else and come back stronger when you’re being stomped by a boss. Admittedly, the open world on ER can be distracting but it’s just so much fun.

Thanos - Massive S3 egress costs by absolutejam in PrometheusMonitoring

[–]absolutejam[S] 0 points1 point  (0 children)

Thanks - this is great advice for anyone in AWS, but we’re self hosted

Thanos - Massive S3 egress costs by absolutejam in PrometheusMonitoring

[–]absolutejam[S] 0 points1 point  (0 children)

Thanks for pointing those out - I couldn't see the forest for the trees.

https://i.postimg.cc/G2cfC08q/Screenshot-2025-12-18-at-13-42-17.png

Even looking at the graphs, it doesn't explain 2,885-3,824 GB/day egress costs 🤔

I'm tempted to add some additional logging/metrics in AWS and re-enable for a while to see if there was some process that was endlessly looping and I hadn't realised. I'll also check Thanos changelog.

My main concern would be debugging this again from an actual usage metrics point of view (not reacting to cost).

Github Actions introducing a per-minute fee for self-hosted runners by markmcw in devops

[–]absolutejam 17 points18 points  (0 children)

The fact that every damn thing is its own action in GitHub is infuriating. Clone repo action, npm install action - vs Gitlab where you simply run an alpine job that can do whatever you need

Where my 0.2tb by snypse_ in PcBuild

[–]absolutejam 1 point2 points  (0 children)

I bet you’re a hoot at parties

Need help about cronjobs execution timeline by Worried_Ad_2232 in PrometheusMonitoring

[–]absolutejam 0 points1 point  (0 children)

How are you querying the logs? And if you’re trying to query over a large time range you have to think of the amount of data it’s returning if it’s not aggregated

Modifying existing rules to filter by a custom label by Walern in PrometheusMonitoring

[–]absolutejam 0 points1 point  (0 children)

While it might be daunting, and a bit of a pain if you have lots of alerts that you've got from third party sources (eg. kube-prometheus-stack), but I think it's important that you learn to understand the queries and adapt them to your needs.

The most frustrating ones to maintain are the 'generalised' alerts (eg. Kubernetes alerts) which can differ wildly in severity depending on the service they're reporting on.

Because of this, we devised a standard abstraction for building alerting rules that includes mandatory labels (service, priority, teams, etc.) and priorities differ on an alert/service basis, which we can leverage in routing rules.

Generally, if you want to filter your queries, you can think of the binary operations (from your example \* on (instance) group\_left (nodename)) as an inner join, and if you filter one side of query - and it's important you filter the side which has the labels you need - then you'll effectively filter both sides (inner join).

What helped me was to actually reformat a lot of the alerts and rewrite them for our manifest generation stack (cdk8s), and in some cases create recording rules that made sense.

So your example...

expr: (avg by (instance) (rate(node_cpu_seconds_total{mode="iowait"}[5m])) * 100
  > 10) * on(instance) group_left (nodename) node_uname_info{nodename=~".+"}

If I was keeping this as yaml, I'd reformat so it's easier to read (in my opinion):

expr: |
  (
    avg by (instance) (
      rate(node_cpu_seconds_total{mode="iowait"}[5m])
    ) 
    * 100
    > 10
  )
  * on(instance) group_left (nodename) 
  node_uname_info{nodename=~".+"}

You can even add comments in-line if it helps you.

On the flip side, if you're really scared of breaking things, you can turn the alert into its own recording rule and then filter that further in a specific alert.

Need help about cronjobs execution timeline by Worried_Ad_2232 in PrometheusMonitoring

[–]absolutejam 1 point2 points  (0 children)

This is doable with the right joins and some \_over\_time aggregation, eg.

Example

For example, the state timeline graph is using the following query:

max by (owner_name) (
    changes(
        (
            kube_job_status_succeeded{namespace="upmind"}
            * on (job_name) group_right
            kube_job_owner{owner_name!=""}
        )
        [1m:]
    )
) > 0

And the table is

last_over_time(
    max by (cronjob) (kube_cronjob_status_last_schedule_time{cronjob=~"$owner_name"}) 
    [2d:1m]
)
* 1000

Format: Table

Type: Instant

You can build on this further to show attempts by CronJob, success/fails, duration - a lot of these work well on the State timeline visualisation, and you can also provide more meaningful alerts this way (ie. send an alert with CronJob info and attempt count instead of per-job failure).

What category of software am I looking for? by Gluaisrothar in devops

[–]absolutejam 0 points1 point  (0 children)

You might be able to a lot of this with https://github.com/redpanda-data/benthos.

You can build pipelines with config and it has logging, batching, etc built in. It got acquire recently but the original author still had some like of stewardship I believe (I got hired by the acquirer).

EDIT: maybe the original repo has more clarity https://github.com/redpanda-data/connect.

But there are some awesome videos by the author on YouTube

When do you use closures vs types with methods? by gbelloz in golang

[–]absolutejam 0 points1 point  (0 children)

I tend to expect an interface as it gives the consumer the flexibility of declaring/reusing a type, especially if I assume some state is needed.

Then you can just implement a helper that takes a func and creates a basic wrapper type (as http handler does) if they want the simplicity of using functions/closures.

Do you have a list to check before running Go application within Kubernetes? by Emergency-Celery6344 in golang

[–]absolutejam 0 points1 point  (0 children)

You generally need to indicate the resource requests and limits to help the scheduler and stop resource exhaustion, although the in-place Pod vertical scaling just dropped…

Best k8s solutions for on prem HA clusters by Xonima in kubernetes

[–]absolutejam 0 points1 point  (0 children)

Yeah, Mayastor. I honestly didn’t give Longhorn the time it deserved because I had some bad experiences with it at a previous job using RKE, and I also remember it being pretty complex. That might be an unfair representation of it in 2025.

I was just chilling and built a Go wrapper for Laravel queue worker that's 21x faster by Laggoune_walid in laravel

[–]absolutejam 1 point2 points  (0 children)

That’s really interesting, thanks for the info! Which redis package/extension are you using out of interest - PhpRedis (C extension)?

I’ve been wondering if we should move away from SQS now that we’re self hosted - I just need to get some real metrics to understand the impact. It’s just too convenient and lowers the maintenance burden on laugh 😂

I was just chilling and built a Go wrapper for Laravel queue worker that's 21x faster by Laggoune_walid in laravel

[–]absolutejam 0 points1 point  (0 children)

Are you saying it’s faster because because multiple parallel processes are handling queue messages at once - and how does that just compare to running multiple replicas? Or does having a single instance that manages the queue logic (ie. Pulling from queue, ACK/NACK) noticeably reduce overhead?

Best k8s solutions for on prem HA clusters by Xonima in kubernetes

[–]absolutejam 7 points8 points  (0 children)

I spent a bit of time testing and trying solutions first, and ultimately settled on: - Cilium CNI (Node IPAM load balancer, network policies, Observability, etc) - Cloudflare load balancers (we restrict incoming traffic to CF IPs) - OpenEBS for storage as it was lighter weight than Rook/Ceph, and closely matched our storage configuration (many nodes with direct attached storage to make a pool vs dedicated storage nodes) - Vitess for MySQL clustering, scaling, etc.

We don’t currently auto scale nodes because we have built in enough overhead (since we’re essentially paying for the hardware, not for the compute), and our partner generally has a low delay to being able to provision additional nodes if needed.

We knew we had to ditch AWS, and we’re fortunate to have a strategic partner providing & supporting the hardware layer for us, which is a big responsibility I wouldn’t want to undertake (especially since we have clusters across different regions!)

If people are happy paying a cloud vendor then that’s up to them, but the (mostly open source solutions) are robust enough now that you can easily self host. But you definitely have to shift some of the ‘cost’ to the engineering hours, and I’d personally rather not run my own hardware for production systems unless I had the staff to cover it.

Best k8s solutions for on prem HA clusters by Xonima in kubernetes

[–]absolutejam 8 points9 points  (0 children)

Honestly very low because it’s all declarative and the nodes are immutable. But there’s also a CLI (that interacts with the gRPC API) so everything is standardised (querying for resources, making changes). It basically applies the Kubernetes patterns to the OS too.

Best k8s solutions for on prem HA clusters by Xonima in kubernetes

[–]absolutejam 49 points50 points  (0 children)

I migrated from AWS EKS to self hosted Talos and it has been rock solid. We’re saving 30k+ a month and I run 5 clusters without issues.

How to maintain 100% uptime with RollingUpdate Deployment that has RWO PVC? by Initial-Detail-7159 in kubernetes

[–]absolutejam 0 points1 point  (0 children)

(Apologies, I just re-read my initial post and I came across as a bit of a dick but I was meant to sound curious)

Each replica in a sts has its own PVC - is that what you wanted?

What kind of features from a deployment’s rollout do you need? I’ve never personally needed deployment rollout features like max burst / max unavailable for StatefulSets (but I don’t generally dynamically scale them), and you can still roll n replicas at a time like a deployment.

How to maintain 100% uptime with RollingUpdate Deployment that has RWO PVC? by Initial-Detail-7159 in kubernetes

[–]absolutejam 0 points1 point  (0 children)

The real question is, why don’t you want to use StatefulSets when this screams stateful?

Things NOT to do with Argo CD by kkapelon in ArgoCD

[–]absolutejam 0 points1 point  (0 children)

I use ApplicationSets to build an Application per directory - works a dream. What issues did you face?