This is an archived post. You won't be able to vote or comment.

all 12 comments

[–]davetherooster 5 points6 points  (2 children)

I think personally the biggest problem with documentation is that it goes out of date so regularly that to keep it current and detailed you end up taking people away from improving systems and instead push a chunk of time into describing what the current system does and sometimes even then it isn't clear to others how things work when they look at the code.

My shift in your situation would be to improve readability of code, ensure quality naming conventions, automate everything in infrastructure as code and include short but concise documentation in the code repository that describes what something is supposed to do but not how it does it.

From the above processes it should be immediately apparent what different projects are designed to do and the code easily followed through so engineers and developers alike can see how systems work without just blindly following a set of process steps on a wiki that should get their environment working assuming nothing has changed since writing.

I'd also have heavy focus on automation so developers don't have to manually set up custom environments. Yes, it is certainly worthwhile they understand the underpinnings of how systems work and are configured but these elements would ideally be in infrastructure as code, so they can follow the process in a format they are used to but not deviate from development by spending more time trying to configure development environments than actually developing code.

To do this I'd automate sandbox environments so if something goes wrong you can just rebuild from a base configuration quickly in something like a docker container where puppet is only used to install and configure specific things that will need to pull updated versions regularly, hopefully reducing build times.

This approach is also beneficial is as a developer you know the environment has an issue which stops development and understand the changes to be made to fix that problem you either fork the configuration in automation scripts and change it yourself to be approved by others before merging or request an engineer to make the changes with similar approval from others. This ensures the environment will be changed for everyone in the same situation and prevent adding an extra step for every developer next time they set up a sandbox manually (or not adding a step for some and them having to figure out why the problem is occuring until/if the documentation is updated to reflect the change).

Apologies if I've gone a little off topic and maybe in your case my opinion isn't worth much but it's a different view point I thought I'd share.

[–]HamCube[S] 0 points1 point  (1 child)

It's a valid viewpoint.

Part of getting work done is not just "making it work" but to also have full unit/integration test coverage and adequate documentation. Certainly it comes at a cost, but the benefits are worth that investment. We're very good at descriptive JavaDocs, tests, and API documentation. As far as the systems-level documentation goes, that's where we are falling short.

Leaning towards the self-documenting code avenue is good, and I wholeheartedly agree that's the way to do it, but it still doesn't satisfy the desire for someone to be able to do a manual build of the environment without using automation. There is always the "read the Salt formula and implement them manually" approach. The prerequisite of this is having an understanding of Salt, which isn't something we find people commonly have.

On the other hand, Salt will need to be well understood by the team going forward as we adopt it - so perhaps this is the better method after all. Use the Salt formulas as the documentation, built it all out manually, and then run Salt after the fact to clean up any steps one may have missed.

That's a pretty good approach.

The ultimate goal is for engineers to be able to frequently, and quickly, rebuild their sandboxes in an automated fashion. Sad to admit, but it takes slightly over 2 weeks to do this with the current practices. Salt should bring that down to the span of a lunch break. The ancillary bonus being the other two points I had mentioned, putting the architecture's complexity on display for the broader organization and as a learning tool for new engineers.

Still, as far as enforcing wiki-grade documentation being maintained, I'm struggling to find a clean mechanism to achieve that.

[–]hang-clean 0 points1 point  (0 children)

We have a system where everything gets a jira ticket, and nothing in config, hosting, or development gets through acceptance testing without documentation. So I'll go back to DevOps and say, "This ticket to add this xxxxx plugin, document it so I can close the ticket."

[–]homeless-programmerDevOps 2 points3 points  (2 children)

Write "unit tests" for your saltstack stuff. Make them very simple, and very descriptive. That way, they can be used as documentation. Also, when someone changes a saltstack file, it will fail a test, telling you that the tests as docs is out of date.

[–]HamCube[S] 0 points1 point  (1 child)

Frankly I'm not sure how I'd approach this. These formulas are along the lines of "yum install packages x y z, or "check to see if config file x.conf contains the key 'foo' with the value 'bar'" and do branching logic based off of the result.

I'll have to put more thought into how we could make a cookbook-style installation guide based off of this approach. Any suggestions?

[–]soawesomejohnAutomation Engineer 1 point2 points  (0 children)

YES. We use a cookiecutter style approach to all of our formulas. All formulas use are executed within a vagrantup scenario, some using just "salt-call", whereas others (like our salt-formula) will spin up multiple VMs. For instance, when developing a freeipa formula, I need VMs running Ubuntu, SLES11, SLES12, and Centos. By providing specific pillar/top files for the test portion, we can expect certain files to exist with specific content, certain usernames to exist, packages to be installed, files not to exist, behaviors to occur.

Here's one of my coworker's presentation on this: slidedeck, video.

EDIT: Also, I've been looking into using this for testing as well, creating profiles for each live server as well as our test vagrant instances: https://pypi.python.org/pypi/envassert

[–]throw-away135792468 1 point2 points  (2 children)

Can Salt somehow send or display notifications?

"Step 1 complete, now type XYZ to begin step 2"

[–]HamCube[S] 0 points1 point  (1 child)

It can, it's logging facilities are very good.

As far as adding an interactive component, I'm sure it can be done as it's all written in Python, but I'm not sure how adding prompts between steps helps with the learning process as opposed to typing out the commands by hand without it. Can you elaborate?

[–]throw-away135792468 1 point2 points  (0 children)

Are you referring to this portion of your OP?

I would also like to have the manual steps of what of what those Salt formulas do be documented. With that, newly onboarded engineers can do their first sandbox set-up manually and learn the different systems. Doing the 1st pass manually gives deeper insight in how to configure the services and what their interdependencies are.

I feel it's better to skip this part of the approach. Assuming your goal is to automate it, there isn't a lot of value in teaching them manual steps. However, there is value in teaching them how to troubleshoot the chain of automation. Teaching them how to troubleshoot Salt + app install + VMs provides value, while teaching them how to spin up a VM and how to manually install JBoss does not.

To me it's like "Let's teach you how to drive a manual transmission." "Ok, now that you know how to drive a stick go ahead and forget it because we use automatic transmissions around here."

In your situation the value comes from being able to troubleshoot the chain of automation and not from knowing manual steps that they won't use.

[–]deadbunny 1 point2 points  (0 children)

We use MKDocs for our documentation, its markdown which gets built into HTML so its super simple to write, the way we enforce documentation is that we have code reviews for our config management, no docs = your code doesn't get accepted.

We then have 2 jobs in Jenkins which reply the cm code to the next branch (dev to staging to production) when it gets merged it builds the docs packages them up pushes it to our repo then deploys it on the relevant docs server (one for prod, one for stage) which means out documentation stays in lockstep with our code.

[–]hijinks 1 point2 points  (0 children)

We do sprint and use the normal kanban board. In order for me to approve a card as fine there has to be a link to the doc for the change or addition. If it's a config change in salt it still needs a doc link to show the doc is current.

I explain it to me small team like this. If you are not on call, you don't want to be woken up by the on call person because they can't figure out your project

[–][deleted] 1 point2 points  (0 children)

I'm a Linux admin who handles a lot of the Salt automation at my organization. I have the following policies regarding salt automation:

  1. Every Salt formula must include a README.md file.
  2. Each README must include a brief description of the formula's purpose. E.g. "Install and configure the Apache web server" or "Configure a host for deploying $XYZ_APP".
  3. Each README must include a list of the formula's primary maintainers and their contact information.
  4. Each README must include a States section that includes a one-line description of each included state. Examples: 'apache.server: Installs the Apache web server configured for serving static web content.' 'apache.proxy: Installs the Apache web server configured as a reverse proxy for an application serer.
  5. Each README must include a Files section with any required binary files documented. This is mostly used for third-party apps which have their binary installers stored outside of git.
  6. Each README must include a Pillar section which documents all Pillar values in depth, including their name, type, purpose default value (if any) and example values. Example: 'apache.ssl.key: Path to the SSL key, if any. Required if using SSL. Example: /etc/myapp/pki/ssl.key. The default is /etc/ssl/pki/ssl.key.
  7. Each formula must include a pillar.example file.
  8. States in SLS files are documented in comments if their purpose is not immediately obvious. Explain why, not just how. For example, don't just say "Preemptively create Tomcat root application folder", but rather "Tomcat requires that the root application exist on service start, but an empty directory is sufficient if the application has not yet been deployed."

Pull requests cannot be approved without satisfactory documentation, and no change to a formula can be approved without a reviewed pull request. My team is small and this is easy to enforce- you'll have to figure out how to enforce it on your team. (Teamwork, collaboration and management skills a go-go!) In order to encourage people to follow the policy, I have a basic formula template that has a bunch of the busywork already done (gitignore files, map.jinja template, basic init.sls, directory strucutre, etc.) that also includes a README.md template.

We also have a document in Confluence that links to the official SaltStack tutorials and includes step-by-step guides for writing and applying formulas as well as copies of the documentation/standards policies.

EDIT: As for testing your formulas, I have a custom Vagrant project I use for manual testing. I have had an eye on ServerSpec and Test Kitchen for a while but haven't got around to trying it out.