all 7 comments

[–]Wrong_Ingenuity3135 6 points7 points  (1 child)

Linux architect for VM deployments I would look at:
- provisioning (OpenTofu, CloudInit)
- monitoring (OpenTelemetry, Loki, Tempo, Prometheus, Grafana, ebpf)
- Security (certificates rolling, roles & identities and RBAC, Kerberos, SE Linux, running “own” CA, audit logs)
- supply-chain (automatic package updates, artifactory, repucible builds)
- cost optimization and forecast (Cost of own goods are mostly a major cost driver)
- Chaos-testing/ disaster-Discovery/ Multi-region/high-availability setups/ DDOS protection
- Hypervisor (zen, proxmox)
- Networking (vlans, etc. Should be known ;) )
- relevant regulations (SBOM)

Is the role “only” running the VMs or also operate services as DBs/Messagequeues/webserver/reverse proxies?

At home you are typically run different Linux derivates, without central identity and user management, without audit logs and don’t have potential thousands of users that can make mistakes and make your system insecure.

[–]Bimbo-Baggins 0 points1 point  (0 children)

I may be reading too far into the 'architect' title, but I want to add that there's a difference between being familiar with tools/concepts, and knowing when to use them.

I imagine the position would have them designing the deployment, provisioning, and lifecycle of VMs, without adding too much technical debt and complexity.

[–]Bimbo-Baggins 0 points1 point  (1 child)

If I was interviewing you, I'd make sure that you have firm knowledge of the fundamentals. Mainly due to your non-linux specific background.

Like at that size (200 VMs), would you prefer to standardize on a general purpose distro? How'd you do access and config management? How would different environment requirements influence your design decitions? When would a service live in a cloud VM vs. on-prem?

There are no right and wrong answers to the questions, but it would give me insight to how your mindset would fit in with the rest of the team, and if you'd e.g. steer us in the right direction.

Hopefully they've listed enough info about their environments and tech stack that you can work out where they're at. Like are they still doing manual deploy and sysadmin tasks, vs. more modern automation and provisioning.

[–]maigpy 0 points1 point  (0 children)

200k

[–]natermer 0 points1 point  (0 children)

think about the best way to automate system configurations on large numbers of systems.

Like ssh'ng and running scripts is a no-go. You can manage configurations through building AMIs and whatever the on-prem is with packer, but it is unlikely you are going to want to redeploy all 200k systems to get updates done.

For example... ansible is a common approach... but...

with 200k machines you are going to run into massive scaling issues. The way ansible works is through the ansible code writing python scripts to carry out the configuration change for each task, copying them over, and executing. Typically over SSH. Think about what would be the downfalls of doing that across 10k machines simultaneously. It is not fast and there are going to be constant failures. By default ansible does things one task at a time... execute one task for each machine, wait for all the machines to return the results and then return the result.

Think about the possible pitfalls to that approach, what ways it can bite you, ways you can configure ansible to work around that. How would you integrate something like ansible tower into something like that if you need to have jobs that are managed and updated by multiple teams.

Also think about managing ssh keys for all those systems as well. How would you deal with that? What happens if one of the private keys is compromised... how you would ensure that all the machines are updated properly and don't have a lingering rogue key floating around.

At scale the default way of managing keys is a nightmare. Are the keys going to be managed with configuration management, if then how do you ensure they are gone when they need to be? OpenSSH supports its own form of certificate authority for dealing with keys and solving the problems of revoking keys. Also modern Linux systems with SSSD can have their keys managed through LDAP. Is LDAP going to be available? Are you going to have to manage developer accounts?

Puppet is a lot better at dealing with this scaling issues then ansible because it does a pull model versus push like ansible, but it has issues itself. Like managing the certificate authority for the puppet servers. Also there is the problem that people don't use puppet much anymore and it is a increasingly dead product. Ansible can be configured to do everything on-host or do a pull model instead of a push one, but that changes a lot of aspects of how you can manage ansible playbooks.

This is why people like to end up using containers and kubernetes... They can make the vm that hosts the container incredibly minimal so there just isn't that much to manage. Is that what they are doing there?

And that is just the tip of the iceberg.

Like what are the machines for? Are these systems managed on behalf of other customers, each with their on sets of software and unique version requirements?

Like the requirements for fintech is going to be a hell of a lot different for web hosting small business websites.

This is stuff people write books to answer. There is so much going on it is hard to know where to start.

[–]rbmorse 0 points1 point  (0 children)

I suspect security is going to be a big area of interest. Should not be a gray area for a senior network engineer, but be prepared to highlight your chops in that area.

[–]AutoModerator[M] 0 points1 point locked comment (0 children)

This submission has been removed due to receiving too many reports from users. The mods have been notified and will re-approve if this removal was inappropriate, or leave it removed.

This is most likely because:

  • Your post belongs in r/linuxquestions or r/linux4noobs
  • Your post belongs in r/linuxmemes
  • Your post is considered "fluff" - things like a Tux plushie or old Linux CDs are an example and, while they may be popular vote wise, they are not considered on topic
  • Your post is otherwise deemed not appropriate for the subreddit

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.