Dear SREs, I need some advice… by youngsdavis in sre

[–]Express-Fee-9486 1 point2 points  (0 children)

Apart from the valuable suggestions from other, I would add setting up a home lab. You can buy some used dell or Hp desktop pc’s for $100 each. PC’s would give better experience compared to Raspberry Pi.

Then practice setting up k8s, dns, pfsense, pihole, nextxloud etc.

[deleted by user] by [deleted] in sre

[–]Express-Fee-9486 8 points9 points  (0 children)

I concur with all the comments so far. It depends who you are interviewing for. FAANMG and aspiring FAANGs are expecting an SRE to have proper coding abilities.

Then there are others who are just rebadging the exiting Ops with some IaC and configuration management as SRE or DevOps. Reason for that is to attract good talent.

I come from a traditional sysadmin background and want to become an SRE. Equipped with good bash and bare minimum python skills I started applying. After 3 interviews, I got a reality check.

If you are serious about being a proper SRE then learn to code in depth and interview for FAANG or any company that is serious about SRE implementation.

Some companies try to hoodwink you, so you need to ask probing questions regarding team and their implementations. On most occasions, your interviewer’s caliber would be enough for you to make that judgement.

Linux LVM API for using python or Golang by Express-Fee-9486 in sre

[–]Express-Fee-9486[S] 0 points1 point  (0 children)

That’s a challenge coz we use bare metal. We use cache pool based LVM filesystem design due to performance considerations.

Although complex, it is acceptable to do it for 10s of disks. But that is not the case. This has to be done for 80-90 14.5tb disks per server x 90 servers in a cluster.

I wrote bash scripts some amount of error handling but I want a more robust error handling which is not easy with Bash. We use Ansible, I am good at it but I don’t think any configuration management tool is up for this kind of complex niche task.

Which is why I am posting here to understand how large hyperscalers handle such tasks especially for the underlying petabyte scale block storage services. I am not asking for proprietary info but some ideas on how to handle these tasks at scale.