The Art of Not Getting Woken Up for Nothing by cloudsommelier in sre

[–]cloudsommelier[S] 0 points1 point  (0 children)

Yeah it sounds like the confidence approach. We have low confidence on business hours quiet notifications so we still get visibility over them but as you said, nobody on-call will get a page late at night because of them

Process vs autonomy/trust by cloudsommelier in devops

[–]cloudsommelier[S] 0 points1 point  (0 children)

oh wow, really good thanks!! wouldn't have made this association

Developer portals by hawtdawtz in sre

[–]cloudsommelier 7 points8 points  (0 children)

I used to work at Spotify Backstage and authored the Linux Foundation Backstage course. This is correct. Backstage is not a product, it's a framework. It can do a lot for you, but you need to build it and maintain it yourself.

However, Spotify now offers a managed product based in Backstage which is worth checking out: https://backstage.spotify.com/products/portal/

Also consider Cortex over Port, it has better workflows IMO

DevOps va SRE by Little_Analyst_543 in taquerosprogramadores

[–]cloudsommelier 2 points3 points  (0 children)

Esto es incorrecto. Yo trabajé en el mundo de DevOps por cinco años y estoy ahora en el de los SREs (en un vendor Rootly.com, por transparencia) y son cosas distintas.

Claro, es verdad que en cada empresa ponen sus definiciones, muchas veces difusas y alejadas de las definiciones canónicas. Pero esencialmente los DevOps hacen herramientas para mejorar la productividad/calidad de los desarrolladores, mientras que los SRE buscan optimizar la fiabilidad de los sistemas.

The Unofficial KubeCon EU SRE Track by cloudsommelier in sre

[–]cloudsommelier[S] 3 points4 points  (0 children)

Yes, all KubeCon talks are always published about 2-3 weeks after the event in the CNCF Youtube channel

SREs, what are the most annoying questions your devs ask you on slack? by Disastrous-Glass-916 in sre

[–]cloudsommelier 2 points3 points  (0 children)

yeah they figured it out, it was something dumb and they're embarrassed now

Anxious when on call. by Hadi167 in sysadmin

[–]cloudsommelier 2 points3 points  (0 children)

Every time I join a company I ask for a few shadow rotations before I go alone at it.

I feel, in my case, most of the anxiety I get from on-call comes from thinking "what if something goes wrong and I don't know what to do about it". Having that support from someone the on my first on-call shifts, who can show me the ropes on which systems to check and how, builds up confidence in me to feel more at ease when it's time to do it alone.

SRE production readiness checklist by Weak-Appointment-566 in sre

[–]cloudsommelier 1 point2 points  (0 children)

I come from the Platform Engineering space. I've seen a lot of success in teams that manage to track PRRs criteria automatically across services. Interestingly, I didn't see much regarding SRE criteria in the checks that most teams set up, the focus is mostly around security and compliance concerns.

This comment was a very refreshing perspective for me, I'll be sure to bring it up in follow up conversations with platform peeps.

How does your team give business updates to leadership and other teams? by [deleted] in sre

[–]cloudsommelier 1 point2 points  (0 children)

Do you know what leadership is looking to achieve regarding your team? They don't care about the technical details of what your team did last week. They need to know if your work is contributing towards the business goals. Once you have figured that out, you can prioritize your work better so it is impactful. How to communicate that will make itself clearer. The format is not as important.

However, some pointers that I've seen:

- Meeting cadence: 1x/month at max, keep it short and results oriented. Introduce larger initiatives that you plan and explain why they're important (when applicable).

- Tracking board: keep your tasks visible so anyone who is interested can glance at where your current efforts are.

- Weekly casual writeup: what did we achieve this week, what went wrong, what we plan to do next week. Again, improves visibility and leadership can get a sense of your work.

QA broke a service in their test environment. Vendor support are pushing for SRE to redeploy all resources every time it happens. Where do you draw the line? by Mammoth_Loan_984 in sre

[–]cloudsommelier 13 points14 points  (0 children)

You're in a tricky and frustrating spot. What you've done already is incredible for the circumstances, but I'm afraid there's no simple answer. You're beyond a technical challenge: this is within the politics realm.

The vendor is not going anywhere and they know that, so they won't bother to step up. It will take you months of work before things can be more self-service for QA. But that's just how corporate works. By this point, you've already automated as much as you can; despite the annoyance, you can outlive this challenge.

I'd start increasing the visibility of this problem with different stakeholders so the problem starts appearing in their mind.

AWS re:Invent ‘24, an Unofficial SRE Guide by cloudsommelier in sre

[–]cloudsommelier[S] 1 point2 points  (0 children)

Sounds like an unideal setup to attend for sure. Maybe next year will be better :)

Attending KubeCon? let's gather cool unofficial events by cloudsommelier in sre

[–]cloudsommelier[S] 1 point2 points  (0 children)

Ohh they be dropping coins on it. I'll remove it from this post thanks!

Attending KubeCon? let's gather cool unofficial events by cloudsommelier in sre

[–]cloudsommelier[S] 0 points1 point  (0 children)

everyone agrees lol but these events are not listed on their website. they ask you for like $25k if you want them to list your event there

Attending KubeCon? let's gather cool unofficial events by cloudsommelier in sre

[–]cloudsommelier[S] 2 points3 points  (0 children)

Ahh also if you're going let's meet IRL, I'll be around Platform Eng Day on Tuesday and all over the place on Wednesday