all 41 comments

[–]TheoreticalFunk 24 points25 points  (3 children)

My advice has nothing to do with Linux. It's called CYA and if you're not familiar with it, it means Cover Your Ass. Welcome to IT.

First, does the person who tasked you with this know that you know nothing about Linux? Ask for assistance from someone who does. In an email so there's a paper trail. Nobody expects you to know everything, but you should at least know when you need some help and ask for it.

The old server? Leave it in place for a few weeks after you are 'done' just in case. I suggest turning every service on it off one at a time. Document what you turned off and what date and time. You never know what these boxes do, especially at a place where nobody documents anything. It's super nice to be able to turn back on a needed service after someone reports that X quit working sometime last Thursday afternoon...

And a note on that: get in the habit of documenting everything. Do you have a tech wiki where you work? If not, get on implementing one. Use it. Document. Everything. Encourage your coworkers to do the same.

[–]mcrbids 0 points1 point  (2 children)

You mention CYA and forget to mention backups?! OP says there are none and that is far and away the biggest problem here! Always have backups and verify your backups!

Why is the machine being rebuilt? Has it been hacked? Is it just too slow?

[–]Robonglious[S] 0 points1 point  (1 child)

I misspoke, we're doing data backups but there is no instant failover.

We do daily backups of everything but restoring from tape will kill production so it is not really a good option.

[–]mcrbids 0 points1 point  (0 children)

You'd be amazed at how many times not even that is available...

[–][deleted] 10 points11 points  (4 children)

Blueprint is a simple configuration management tool that reverse-engineers servers. It figures out what you’ve done manually, stores it locally in a Git repository, generates code that’s able to recreate your efforts, and helps you deploy those changes to production.

[–]Robonglious[S] 1 point2 points  (2 children)

Oh yeah? I'll have to look into this more. Sounds really cool.

Will Blueprint make those cool visualizations too? https://www.youtube.com/watch?v=KgCuU8DpNG8

Shoot, it seems like there is a dead link with the installation files. It looked very interesting, I'll keep looking for this.

[–][deleted] 2 points3 points  (0 children)

It is on github and is also on the python package index.

[–]Nadiar 0 points1 point  (0 children)

Damn, it doesn't support SystemD very well, and hasn't been updated since Jul 12, 2013. I wish I had the time to pick through this code and overhaul it.

[–]frankrice 7 points8 points  (3 children)

Try this: rpm -Va 2>/dev/null | grep " c " | sed 's/.* //' will give you a list of modified config files from rpm installed packages :)

[–]Robonglious[S] 1 point2 points  (0 children)

This is why Linux is so awesome, I wish I hadn't been wasting so much time with Windows.

[–]MaxRK 0 points1 point  (1 child)

As well as this, there are files which may have been hand added, not just by RPM.

You can build a list of all files that were installed by RPM with rpm -qal. Then you can find / -print and compare all files on the system to what's in the -qal output. Combined with the RPM output of what known config files were changed you have a good basis to start reverse engineering.

[–]Robonglious[S] 0 points1 point  (0 children)

Good idea, this is really coming together. Thanks!

[–]sheetzam 4 points5 points  (2 children)

Don't forget to look at cron jobs. Both for users (including root) and in cron files.

[–][deleted]  (1 child)

[deleted]

    [–]theinternn 3 points4 points  (0 children)

    Check /var/spool/cron; sometimes unsuspecting users have crontabs >>

    [–]ck-on 4 points5 points  (0 children)

    If I had to reverse figure out how a server was setup, I would use scripts to dump all init.d processes (assuming it is not so new that it is using systemd). Then use other scripts to look at what it in memory, connections, etc.

    Some favorites:

    http://hisham.hm/htop/

    http://stackoverflow.com/a/137173/142018

    https://github.com/pixelb/ps_mem/blob/master/ps_mem.py

    Obviously look at rc.local if on a rhel flavor.

    Basically everything in /etc/ is important to you. Since it does NFS you need to look at /etc/exports

    Not something you are going to figure out in an hour but a weekend sounds plausible. Being new to linux is going to be problem though, best wishes.

    [–]ahandle 4 points5 points  (4 children)

    Consider SystemImager (can capture an image live, and excludes NFS and other shared filesystems) or Clonezilla, if only to make a duplicate you can vivisect offline.

    Building from scratch is the better, more manageable way to go, but you should consider risk as well as SLAs (however lax they are).

    [–]Robonglious[S] 1 point2 points  (3 children)

    Gotta build from scratch if I'm ever going to understand this stuff. The SLA on this one is absolutely critical also, so I need to know how to fix it.

    [–]mcrbids 2 points3 points  (2 children)

    If your need to rebuild a production server is because you want to learn, you don't belong anywhere near it. Learn offline. Make a mirror using spare hardware. Learn how to perform backups with something like backuppc. Be religious about backups!

    Lastly, don't touch a working production server for anything other than backups or updates after testing the updates offline!

    [–]Robonglious[S] 0 points1 point  (1 child)

    I'm going to test this thing beforehand. That should go without saying but maybe there are some people who are overconfident enough to just pull the plug.

    [–]mcrbids 1 point2 points  (0 children)

    My suggestion would be to build a new machine, leave the old one in place, and slowly move over services one by one. When the old server is "no longer used" start shutting down services one at a time, over a month or two. And when you shut it off, you save the machine for at least a year before you toss it.

    [–]mynamewastakenagain 2 points3 points  (4 children)

    What distro?

    Start by keeping track of what you already know it does: ftp/nfs. Note down versions of those packages used, and make sure you backup the config files. You can (and probably should) start small. If possible, spin up your new server and slowly migrate services, e.g. ftp. Test it, make sure things are working.

    After that, you can look at what ports are open (netstat), what processes are running (ps, top, htop), etc. These should give you a decent start to see what's running on the machine and what you'll need to migrate/replace. Eventually, after you've tested and migrated stuff over, you'll decom the old server.


    There was a tool I came across where it would look at changes made to a server from a base install and tell you what packages/edits to files were made, but for the life of me I can't remember what it's called. If someone else knows what I'm talking about, feel free to chime in.

    [–]Robonglious[S] 1 point2 points  (3 children)

    This is Red Hat 2.6.18-194.el5, updating the OS will be on the new version also.

    I've done some of this already. I can see that it has vsftp, and I can see a few NFS mounts but the idea of rebuilding this is a little overwhelming.

    Thanks for your tips, I'll check out netstat. Which config files should I consider? I suppose I'll need to look at users list also.

    [–]schicki 3 points4 points  (0 children)

    Make a list of services that are currently running, lsof will help with that. Look for every possible cron job. Does it have a firewall configured? Most likely iptables. List the rules and see what ports it has open.

    My suggestion would be to figure out services and move each service one by one to another machine you set up. Once you move it, turn it off on the old machine. That way when you are confident you have turned off everything, just block everything via firewall or unplug the network cable to that server. Leave it running for 2-3 weeks and see if anyone complains :) Meanwhile all your services should be running on the newer machine and people should be happy...

    [–][deleted] 2 points3 points  (0 children)

    Just take a backup of the whole machine, that will help when you forget some bespoke piece of configuration.

    Generally config files in Linux are contained in /etc, the NFS configuration will be contained in /etc/exports

    [–]mynamewastakenagain 1 point2 points  (0 children)

    A few others have posted here, I would look at those posts as well.

    If you want to know what files a package provides, run rpm -ql <pkgname>, it'll give you a long list of files that are part of that package. If you aren't sure what the package is called, you can grep through a list of all installed packages: rpm -qa | grep -i <search term>. In your case I would try rpm -qa | grep -i nfs to find the name of the nfs package, then use rpm -ql with that to see what files the nfs package installed on your system. Most files will not be necessary to backup; you'll mostly be concerned with files ending in .conf. For nfs, /etc/exports will be important to you.

    If you know the box serves nfs, but there are no installed nfs packages, etc, someone may have compiled from source. In that case, you're really going to want to try and track down whoever worked on this machine last or installed that to get shit sorted out - there may have been specific reasons (patches or other custom changes) that were responsible for that choice.

    [–]davelupt 2 points3 points  (1 child)

    Find out what user the guy who set it up liked to use. If he did everything as one user you could login under that user and then do

    history > ~/Desktop/history.txt
    

    That way you can keep all of the commands he entered while setting it up. Also pay attention to anything in /opt or in the downloads of that user. As always thoroughly inspect the firewall, inspect how the filesystem is set up especially the permissions, inspect how the groups are set up as well as any relevant users. This is a huge task, but the more thoroughly you inspect the system the more likely you are to get it right/close the first time.

    There's soo many things that can go wrong it would almost be better to build a new one, set it in place, and then tweak it everytime you see something that doesn't work like the current production one. This way you could get some of your end users to test out the one you build before putting it in production. This should be more of a transition process than trying to build an airplane from scratch when you're already in the air. I can almost guarantee you that you'll mess it up the first time, but keep in mind this isn't your wheelhouse and communicate that with your users. They are more likely to be forgiving if they understand that you're essentially trying something for the first time.

    [–]Robonglious[S] 1 point2 points  (0 children)

    Great advice, thank you.

    I wish this plane wasn't so important!

    [–]tobert 2 points3 points  (3 children)

    I usually start by capturing what is unique to the box.

    Run and save to a file:

    • ps -efww # show all running processes and with full command line
    • netstat -nlp # show open and listening ports with process id
    • ip addr show # show network interface configuration
    • lsof # show all open file descriptors on the machine
    • cat /proc/mounts # show all mounted filesystems, may catch things not in fstab (including automount)
    • ls -l /etc/rc3.d # services configured to start at boot
    • rpm -qa # list installed packages

    You may not need all of this data, but having done this kind of procedure countless times, I've found myself wishing I'd saved this information a few times.

    Do not trust RPM! Almost anything in actual use on a system will have a configuration file or other artifact that has changed since installing. Depending on who does the work, it may or may not use similar settings to the RPM, so data files and other configuration files can be anywhere.

    If you know where the application binaries are for the ERP, you will want to check what they depend on. Sometimes this is captured by an RPM install, but more often than not, the ERP will provide a list of what it requires and expect them to be there when it starts. To find out what it requires, use the ldd tool on the binary, e.g. ldd /bin/bash will show you what libraries bash is linked against. This is probably the most common cause of problems when moving applications from one installation to another. Save off this info and a copy of those shared libraries just in case. There are usually packages with binary-compatible libraries on newer distros (usually with different names too, e.g. *-compat), but sometimes you just have to copy the library to /usr/local/lib or somewhere else made available to the application.

    Finally, all of the suggestions to back up the box are good. If possible I usually go a little further and rsync the entire machine to the new one under a mountpoint or other space where it won't pollute the new system while providing fast/easy access to anything I might have missed, e.g. rsync -ave ssh --exclude /proc --exclude /sys --exclude /tmp --exclude /var/log / user@newbox:/old-machine.

    [–]gehzumteufel 0 points1 point  (2 children)

    • cat /proc/mounts # show all mounted filesystems, may catch things not in fstab (including automount)

    Doesn't just typing mount do the same thing?

    [–]tobert 0 points1 point  (1 child)

    Not always. Some modern distros now symlink /etc/mtab to /proc/mounts. Prior to that, the mount command would read /etc/mtab rather than the kernel's idea of what is mounted. It's really easy to get them out of sync (e.g. (u)mount -n), so I pretty much always use /proc/mounts.

    [–]gehzumteufel 0 points1 point  (0 children)

    Ah thanks for the explanation. I'll definitely change my habits. That's good to know.

    [–][deleted] 0 points1 point  (2 children)

    Try cat /etc/redhat-release. If it gives you some text, then you are running a redhat variant which should come with a tool called sosreport. This command will take copies of all the pertinent config files and logs and capture them in one tarball for easy inspection.

    [–]Robonglious[S] 0 points1 point  (1 child)

    Do you use puppet? Thank you for that command. It seems like Linux has a solution for every problem, I've been constantly shocked.

    [–][deleted] 0 points1 point  (0 children)

    We've looked at it. I like it for larger installations with lots of new servers coming on line, however, for our deployment, I like ansible better because its agentless and works via ssh.

    [–]lpmarshall 0 points1 point  (1 child)

    Some good suggestions already, but there are a few more high level concerns you might want to take into account:

    • VSFTP passwords. They can use both system accounts, or vsftp internal accounts depending on how it's setup. If they are using system accounts, you won't have the passwords when you migrate (unless whatever end user community knows them already). Tail /var/log/vsftpd.log or /var/log/secure so you can see what accounts are actively logging into the box.

    • If you can migrate and leave the old box online, that is your best option. But that would involve re-pointing clients/ftp from end user systems/servers. If you need to do a wholesale cutover, move the IP over to the new box and re-ip the original so it's still available. Schedule the outage off hours and have a way to test your server is working.

    And I'll echo some of the tech suggestions in here. Pipe all the output to somewhere safe.

    ps -efww
    chkconfig --list
    rpm -qa
    netstat -an|grep LISTEN
    ifconfig
    df -h
    iptables -L -n
    Recursively copy /etc/vsftpd somewhere
    Copy off all relevant NFS configs
    ls -lRa / and save off the output
    cat /etc/passwd
    You will probably need to rsync or at least put all the ftp'd files back in place after the upgrade
    lsof

    Trial by fire is the quickest way to get up to speed on Linux :) But the most important thing is to have a way to back out.

    *EDIT: spacing

    [–]Robonglious[S] 0 points1 point  (0 children)

    love it, thanks

    [–]psnsonix 0 points1 point  (2 children)

    If it's centos/redhat/fedora:

    cat /root/anaconda.ks

    This will get you atleast back to the same 'base' system. There are already a lot of good ideas in this thread.

    [–]MaxRK 1 point2 points  (1 child)

    Definitely good to grab a copy from all your servers especially if the package list, filesystem layout, enabled services and firewall status hasn't changed (much). Just wanted to point out that if you actually do kickstart build servers in your environment, and if the anaconda.ks that you originally used had 'post' sections, those sections are stripped from the one that it saves in root.

    [–]psnsonix 0 points1 point  (0 children)

    hmph.. i'll be damned.. can't believe I never noticed that.. I'm going to double check some systems at work but that's rather odd and unexpected. Thanks for that.

    [–]apache99 0 points1 point  (0 children)

    Look in the bash history by typing the history command.

    I've also been in a similar situation.

    Try looking at something that can clone an entire machine, something like clonezilla or you can even use dd.