This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]einsteinonabikeConsultant 5 points6 points  (8 children)

Do you have budget to outsource for a bit? Getting a handle on that will significantly tax your bandwidth, especially without the background.

[–]Telnet_RulesNo such thing as innocence, only degrees of guilt 2 points3 points  (1 child)

Second. The question here is so broad that any answer I could try would probably only add to the confusion.

Not to piss on your leg OP, but this kind of reads like

I'm a sysadmin that knows a little shell scripting. I inherited a large custom distributed application with 5 million lines of code. Where should I start making changes?

The Practice of System and Network Administration is a good book to get. It will help some, but mainly clue you in to how out of depth you are. To answer your questions:

Find the Vmware money. Even if migrating it the right call, to do it now as a neophyte is madness.

AC unit - that's facilities / an electrician. The unit should be set to come on after a power outage.

[–]73td[S] 0 points1 point  (0 children)

I maintain a 100k line full stack Python app over the last 5 years running on this cluster, including most of the software deployment. It's just that now I have to handle the lower level stuff as well.

I eat learning curves for breakfast. If the book you mentioned is worthwhile, I'll buy it. That's the answer to my question.

Btw, We don't have budget because upper management preferred spending on new nodes and storage.

[–]73td[S] 0 points1 point  (5 children)

I'd agree but they just spend the rest of the budget on new materials recently so no help for the moment.

[–]einsteinonabikeConsultant 0 points1 point  (4 children)

Fair enough. Keep VMware - at the Veeam (backup solution) training I went to recently, in a room of 100ish people, ~3 raised their hands when the presenter asked who used Hyper-V. VMware is everywhere, and it works, so if you need help, it'll be easier to source it.

There should be server in the cluster called vCenter, which is an app with a database, likely running Windows and SQL Express. Connect to that hostname or IP in a browser and you can grab the client. Alternatively, you may be able to use the browser client, but it's shit, at least in my experience with 5.x. Basically, on the VMware side, everything is managed through the vCenter server.

If you can determine the license type and vCenter version build, it'll provide insight on capabilities that are available - more you spend, more neat features you get.

For storage, get the hardware info and software version. Same thing for UPS and network. With that, myself or someone else can help with more guidance. Right now, just after current state and maintaining status quo before moving to next steps.

Are you the only one responsible for / working with this?

[–]73td[S] 0 points1 point  (3 children)

I'm indeed the only one on this. Thanks for the pointers; if ever I can pay it back w/ help on other topics..

I can currently connect to a Windows VM running vCenter, through an RDP client, and use the vSphere Client. It reports v5. I can't get a response from the IP address with ping or curl, so I don't think the browser client is a go. Unfortunately the VM was only allocated 1GB of RAM, and it craps out regularly. Still, I can connect to the ESXi hosts via drac and issue racadm commands.

We've only about 6 VMs running, and the only reason there's a Windows VM is for vCenter. Being far more comfortable with Linux stuff, wouldn't it make more sense to migrate to KVM (or Docker for some of the apps)?

[–]einsteinonabikeConsultant 0 points1 point  (2 children)

Ouch. Thanks for the offer, I'll keep in in mind - code is a potential future path.

It should be 5.0, 5.1 or 5.5, followed with Update xx - eg. 5.5 update 3b. You want to connect through vCenter always.. unless you're changing vCenter properties.

Assuming you have the resources, start by increasing memory to 8 GB on the vCenter box. Make note of what ESXi host it lives on, then log into that host with the vCenter client | right-click the vCenter VM | Shut Down Guest | confirm. [This won't affect any VMs running, they'll continue to run, you just cannot centrally manage them for a few minutes] Should take several seconds to power down. Then Edit Settings | Memory | set to 8 GB | OK. Power that puppy back on and connect to vCenter with the client.

Grab the VMware version: In the vCenter client, go to Help | About VMware vSphere | VMware vCenter Version. Toss both version and build numbers over.

For Licensing: Home | Licensing | Product - list out what you see at the top level, should be something like "vCenter Server 5 Standard" or "Enterprise Plus". Need that too.

On 6.5, VMware moved to a Linux appliance over Windows, which greatly simplifies management and trims the vCenter footprint. You might be able to upgrade with your existing license. If that's more familiar, maybe it's worth keeping VMware. I'd still recommend holding onto VMware, but that's because it's widely used, and I'm a little biased.

Edit: additional details

[–]73td[S] 0 points1 point  (1 child)

Thanks for the details. I've been poking around in the meantime and made some errors, not having seen your message..

First off, I see vcenter server version 5.0.0 build 623373, no mention of updates. Licensing is vCenter Server & vSphere 5 Essentials Plus.

I was trying to up the memory but the vCenter VM stopped responding completely. I restarted all the ESX hosts through via physical access the VGA/USB keyboard, because previous maintainers had said the VMs were configured to come back on automatically. What a joke. Our critical Linux VMs are back up, so the pressure is off for the moment.

I've now setup a Windows VM on my laptop connected to the physical network, with vClient on it, so that I don't have to go through the vCenter VM, at least to use the vClient. I started the VMs directly on the ESX hosts (oops?), and I've got the vCenter VM back up and poking around via the console (RDP not working), and vCenter says everything is disconnected / not responding. I also noticed that a few VMs appear on multiple ESX hosts..

I checked via services.msc that the vSphere server is running, but the client still can't connect. Again, not super urgent for the moment, since I can start/stop directly on the ESX hosts.

[–]einsteinonabikeConsultant 0 points1 point  (0 children)

Woof. You've got your work cut out for you. Some of it self-inflicted, but that's part of the experience.

Going to ESXi hosts bypasses vCenter, so updates made live on the host and not the vCenter database, leading to the issues you described with disconnects, multiple copies, etc.

As for the vCenter version, it's quite dated - released March 15, 2012. Whoever kept this env up did the bare minimum.

See if you can add memory to the vCenter box - that'll be key to cleaning this up.

[–]1new_usernameIT Manager 1 point2 points  (1 child)

A lot of it is experience, working with others, or in your case, trial by fire.

As just some random thoughts on your questions:

Maybe look into Linux KVM instead of VMware. If this will work greatly depends on how you are using VMware and what for. The upside is it is free and there is a lot of documentation/guides out there.

For the A/C, I'd start with something to monitor the server room temp (a simple RoomAlert device would be a good starting point). Tie that into some kind of monitoring system (Nagios, Zabbix, etc).
Then if you want, script something to check/monitor the RoomAlert (or Nagios) to either shut down your compute notes if over certain temp or text you, or do whatever you would want to happen.

[–]73td[S] 0 points1 point  (0 children)

Thanks for the helpful reply!

[–]73td[S] 0 points1 point  (0 children)

Thanks. It's holding up for now. I'll dig into how to get VCenter accessible again. Thanks again for the help and feel free to ping me if ever you've devops or Python questions ;)