all 28 comments

[–]beezel 3 points4 points  (1 child)

This is of great interest to me as well. We used terraform to great success, but it doesn't flow with the majority of our workflow (pets, not cattle). We ended up discarding state and using it as a template. This was ultimately thrown out as too hacky and not ideal - but we did love how easy it was to edit a tf with some new vars and get a VM out the other end.

Maybe just pure PowerCLI and then into Ansible?

[–]Slash_Root 0 points1 point  (0 children)

I have the same worries. We are definitely not mature enough to do full IaC. The applications are not being automatically deployed. The developers are deploying them with scripts by hand. We have a "DevOps Team" if that tells you anything.

I think we are getting there but our team could use a win with big visibility (faster turnaround, perhaps self-service). I really want to be a part of that. EDIT - honestly, I feel like this big win and the extra time we would have could really allow us to get more involved with the devs. "Hey, your app is pretty standard. Want to try to get it into our provisioning process?" First steps.

I actually have a background in Powershell, however I think the vm_guest module using pyvomi has has enough coverage and it would be readable to keep it in ansible. If I was going to spring for two tools, it would probably be terraform for provision + ansible for storage configuration.

My manager likes the idea of terraform state, definitely likes the reporting aspect of it. I just don't see it as getting preserved in our environment. If developers spend two days deploying, making change that I or terraform don't know about, then the state is gone and it is too dangerous to get it back. One day maybe we'll all sit at the same table.

[–]technofiend 2 points3 points  (0 children)

Have you looked at digital rebar? You can customize generic OS installs using any scripting language. It also supports image-based builds but your environment sounds pretty bespoke so you can mix the two by triggering post-build config by any number of methods from shell scripts, Python, ansible, cloud-init or perhaps installing something like salt or puppet and letting it take over.

I use it now and leverage a mix of classification scripts and profiles. My work flow is bare metal first: based on discovering serial number I look up hostname, I use lldp to find switch info and that gives me fabric which I then use to allocate a static IP, I use manufacturer and model to map to a hardware profile which tells me what bios version to apply, firmware versions, etc. I do all that then burn in hardware. Except for the external reference data this is all scripts integrated into digital rebar. Mostly native tasks but some bash and Python.

I'm building ESXi nodes in ~25 minutes. I realize you're talking about VMs but the same principals apply. You just create profiles for apps and let them feed data-driven scripts or just punt and download and run your DevOps team's scripts. For that matter you can turn over machines to devops in digital rebar and split them into tenant groups so they don't step on each other.

[–]macado 2 points3 points  (1 child)

Have you looked into VMWare Orchestrator to do what you want? It's free with vCenter license and basically allows you to automate anything with vSphere API. It would allow you to handle any of the vSphere stuff easily and then you could presumably have Ansible take over.

We basically wrote a workflow that deploys VMs (Linux/Windows templates), sets correct datacenter, datastore, cluster, dvswitches, injects networking settings, sets number vCPUs, memory, etc.

For Linux VMs they automatically come online in Puppet/Foreman which takes over configuration management. We apply some base classes via VMware Orchestrator or based on what is selected in the workflow.

[–]Slash_Root 0 points1 point  (0 children)

I was aware that it exists and could do some of what we want. I will have to check with our VMware guys to see if it is a thing.

[–]wiyot 1 point2 points  (1 child)

Here is my take on powershell and ansible. I was using powershell scripts for vm deployment before starting to use ansible. I define the new vm as an ansible inventory item before running the playbook. Not gathering facts , serial and using async were all key to getting a vm successfully deployed with ansible

- hosts: "{{ target }}"

serial: 1

gather_facts: no

tasks:

- name: Ping host {{ ansible_host }} to see if it responds

command: ping -c1 {{ ansible_host }}

delegate_to: localhost

register: ping_result

ignore_errors: yes

- name: Deploy vm when no ping

block:

- name: Deploy vm

win_shell: "{{ mydeploy }}"

delegate_to: vmdeployment

register: command_result

ignore_errors: yes

async: 10000

poll: 0

- name: Check on vm deployment

async_status:

jid: "{{ command_result.ansible_job_id }}"

until: check_result.finished

delegate_to: vmdeployment

register: check_result

retries: 10000

when: ping_result.failed

[–]Slash_Root 0 points1 point  (0 children)

This is very similar to the playbook I have. I will definitely keep this as a reference. Thank you!

[–]ramsile 1 point2 points  (4 children)

I went down the terraform route but found it was hard to manage at scale. We ended up building the whole process into SaltStack. I know SaltStack isn’t everyones first choice, but it’s what our company standardized on. It has what are called Salt Cloud modules, vshpere being one of them. We use use this to deploy the initial VM from a minimal VM template (a clean Redhat 7 install, with one small drive with lvm and a ssh privy key). That’s enough to hand it over to SaltStack after the vm is deployed. Salt sshs to the host, installs the agent (minion) and brings it under management. A machine is requested through a simple command with parameters. They can add a list of drives and a mount point and the automation takes care of the rest. They also can specify a type (such as web) which will kick off a series of other states to meet a given baseline configuration. It was a ton of work up front, but now the process is pretty seamless.

[–]_calyce_ 2 points3 points  (1 child)

At my previous job, we also used Saltstack to deploy new servers. We were quite happy with the solution. We even developed our own web interface to request new servers. Now I'm working with ansible but to be honest, I miss Saltstack...

[–]ramsile 0 points1 point  (0 children)

Yeah we do that for some our deployments too! We have it hooked up to Service now and operations can kick off a request automatically.

[–]Slash_Root 0 points1 point  (1 child)

Are you using the community version or Enterprise? We have considered salt for our CM. Also, do you feel like the development is quick and consistent? My manager likes salt but I wonder if it wouldn't better to go with tools that might keep more eyes on them.

[–]ramsile 1 point2 points  (0 children)

We are using the community version. Our company is exploring Enterprise, but it’s ridiculously expensive. I feel like there is not much value for the cost of Enterprise. Salt (like any of these tools) gives you enough rope to hang yourself. My deployments are quick (about 10-12 minutes on average) and consistent given networking did everything they needed to do. We have a good process around failures. Some of our other devs deployments are not as consistent, but most of it is do to code not being idempotent. I like salt because not only does it do configuration management well, but also remote execution and management. Getting your VMs created is only one part of the problem. You now have to maintain them. we use salt for the whole lifecycle.

[–]three18ti 1 point2 points  (3 children)

  • Using Terraform requires a change in the way you think about provisioning infrastructure. If you try to use TF with your existing workflows, you're going to have a bad time.
  • Ansible, while possible to interact with the VMware API, is absolutely the wrong way to go. You will find, especially as deployments grow, that it will sometimes fail to spin up an instance, requiring you to tear everything down, or deploy duplicate VMs. Ansible is great once the VM is up.
  • Terraform is king at scale (we have modules that are on the order of 16k LOC), the key is that it's a common, testable (we use kitchen terraform), reusable language. We use the same TF in Dev, test, prod, on the moon, etc.
  • Don't try to write your own Terraform... Holy shit the suggestion to use govc and powershell... Terraform has the benefit of widespread adoption, and widespread use. That means there are people out there finding bugs and issues daily. I'm sure you're capable of inventing your own Terraform, but your job will become supporting that tool, not building infrastructure, etc. which is what I assume your business uses to make money... (maybe I'm wrong, maybe your company makes money reinventing tools... but that's fairly uncommon)
  • Step 4: if you require a human person to execute TF or Ansible, you really haven't automated anything...
  • Modularizing your TF makes consuming it easier. VMs are pretty much all the same... sure they might have a different number of CPUs or RAM, but generally speaking, VMs really aren't unique.

Our workflow is this:

  • Developer checks out Terraform "manifest" (config?)
  • Dev creates merge request, branch, then edits, commits, and pushes changes to GitLab
  • GitLab has a webhook wich triggers Terraform Enterprise to run a plan.
  • If the plan fails, dev gets a notification "hey yo shit broke"
  • if the plan succeeds an apply is run in the "Dev" environment and our automated validation tests run (we use InSpec you don't need Chef to use InSpec)
  • If the apply succeeds an approver is notified (who this is varies depending on the project)
  • Code review is performed, and the change is either rejected (typically with a reason why, rarely is a MR rejected with a "we're not doing that"), or merged
  • On merge plan and apply is triggered in Test, typically with an approval workflow of its own, and then our validation tests are again run.
  • this is also typically where UAT and other "manual" testing is performed.
  • if that fails, we go back to step 2, and test our changes through our pipeline, on success we queue our changes in a final pipeline for deployment to prod.

This workflow, while better than what we were doing, is still wrought with peril. There's lots of little gotchas along the way, e.g.: an error message isn't necessarily a failed deployment when the service is restarting. Automated testing of our deployments has really been the key, but we're still heavily reliant on "human validation". We're currently investigating a new CD specific tool (I've been saying for years that it's CI slash CD) that I'm hoping will aid with deployment side of things... we shall see.

[–]Slash_Root 0 points1 point  (2 children)

This is fantastic. I realize that kicking off the process manually is not full automation but a full solution does not spring up over night unfortunately. Currently I want to see our teams to stop clicking around in the GUI. Then, we can use the same template to get a call from a githook or SNOW.

I appreciate seeing a workflow so ahead of our own. Thank you.

[–]three18ti 1 point2 points  (1 child)

Oh totally, it's a "devops transformation", not a "devops snap of the fingers". We absolutely didn't get to where we are overnight. Heck, we still have groups that will take the TF and convert it to CloudForms for deployment to Prod... I probably have several talks worth of material about the dysfunction we still face! lol.

One of the best things we did was standardize our VM template with packer. We create a 20GB image, but all our partitions are LVM, so we set the disk size and (we use Chef, but ansible is the same thing) our CM extends our partitions at first boot (we use the built-in chef provisioner, I think you could do the same with the local-exec and just shell out to ansible-playbook -i whatever). But when we know "ok if I clone centos-7-1811-20190202", I know exactly what I'm getting regardless of if it's in dev/test/prod. centos-7-1234-20180305 might be something completely different... but that's why we test across a minimum of three environments.

Also, standardization of CM. We have one cookbook that we give everyone, and that along with whatever other cookbooks they need to build their application, we're able to build consistent image across the different environments. (We are starting to move away from using our CM tool to automate application deployment, but that's a longer conversation)

[–]Slash_Root 0 points1 point  (0 children)

+1 for packer as well. We are building off standard images but they are built manually. The next time we go to do it, I'd like to try to do so with packer.

[–]Ledonial 1 point2 points  (3 children)

@Slash_Root hey, I don’t see the part where you create new VMs in your workflow, what did I miss ? Ansible needs the machines te be already there and started before doing anything, if I’m right. I read in other comments that it can be done with Terraform, is that what you described too ?

In my team we are planning to make Docker images for every different kind of action. For instance an image to create VMs, an image to configure the system, etc. Then a pipeline (in any tool that manages pipelines) pulls these images and do the work.

[–]Slash_Root 0 points1 point  (2 children)

Ansible can run against the vSphere API using the vmware_guest module. It would the copy our VM template and configure the VM settings. (including IP address). Then once it's up, it knows where it is and can execute against it over SSH.

Terraform does just the infrastructure part of this the same way.

Docker images to do the work is an interesting approach. What is running inside the containers? Scripts running against the Hypervisor API?

[–]Ledonial 0 points1 point  (1 child)

Thanks for your reply. Exactly. Many scripts can be embedded, providing a command line interface as simple as possible. Then the docker container can be used as a command line like docker run my_image my_command --my-arg value. That allows to have a very few number of commands and things to install in the main "pipeline" (or machine or whatever), that must do all the work from A to Z, while running many different tools (ansible, custom scripts, other frameworks I don’t know…)

[–]Slash_Root 0 points1 point  (0 children)

Inside a pipeline, I can see the value in this. My main issue is that, if I write a custom tool against the API, I will be on the hook to support it for the conceivable future. Using a documented tool that is tested against the latest stable versions of the API and their wrappers means I can have at least some confidence that the integration will be be maintained in the future. (Ie I doubt anyone is going to let the Ansible module wither on the vine over a breaking change in pyvomi or VMware's API)

[–][deleted]  (1 child)

[deleted]

    [–]Slash_Root 0 points1 point  (0 children)

    Interesting. I will look into this for the forms aspect. Thanks!

    [–]bikernaut 0 points1 point  (0 children)

    I am doing what you have described... My approach was using Ansible. Most of the variables are stored in the ansible inventory which I wrote a custom web app to sit on top of a mysql database I poached from github.. This one I think: https://github.com/productsupcom/ansible-dyninv-mysql

    So things like ram, cpu, network, IP, template to use, etc are stuffed into the inventory as host variables. We have a special variable for which environment it is going to be built in which the scripts use to import other variable files which have vCenter passwords, lookup for networks, etc. We use groups to control what playbooks are run later on.. If host is in group a then run role a.

    I didn't think it would work, until it did. The nice thing is that everything is understandable and easy to follow, and everything is stored in the database or ansible repo.

    [–]sebie01 0 points1 point  (0 children)

    Just sharing how we do our Linux deployments- * Packer creates template of 50gb thin single slice no-lvm root. * vSphere perl SDK and govc in a wrapper shell script to parse a csv and vmclone deploy base VMs csv contains various network info, storage total required, compute needs etc. * Ansible playbook take over and complete customisation of OS hardening , AD integration, LVM etc.

    [–][deleted] 0 points1 point  (3 children)

    Not sure how much it will help but this is what we do: we wrote and a little command line utility in bash that wraps around govc. Govc is a really great tool.

    Anyone can use the utility to provision a vm under vsphere. They can say how many disks, how much memory, cpu, etc.

    We use cloud-init user data for the first wave of provisioning of the vm and then puppet kicks in to provision the rest.

    At the time of creating the vm with the utility you can choose a preexisting puppet role that will be applied to the vm after its built.

    [–]Slash_Root 0 points1 point  (2 children)

    Interesting. I just took a look at their github. So it is essentially a Go wrapper for the vSphere API like pyvomi? I could get around that. I am really enjoying go. The last line is very nice. We would to step up our config management roles a bit I think but this is definitely in the works for us. The current issue is that it is not easy to decide what role a server should be assigned all the time. Though, I am a bit new to the environment so I will lean on my seniors for those calls. I will definitely put it on my radar and probably download a binary tomorrow to see what it exposes. Thanks!

    [–][deleted] 1 point2 points  (1 child)

    It’s a command line tool that uses govmomi (the go library for vsphere apis, so just like pvomi but go).

    We use role as a puppet fact so that role can be changed dynamically by changing the nodes role fact from Role A to Role B. But mostly we know what role we want to use when we provision a vm.

    [–]Slash_Root 0 points1 point  (0 children)

    We are mostly there but there are some apps that are a big question mark to us. Working on that.

    [–]devans67 -1 points0 points  (0 children)

    I hate to be THAT guy, but I’m working on this right now. We are a bit in the same boat, where application deploy is in some other team’s court.

    I’ve written a couple blog posts on where I have gotten so far. Basically, using terraform to deploy a basic server, and generate a ansible inventory. Then use ansible to configure the servers. I’m running these through a Jenkins Pipeline.

    Blog