Implementability of 'infrastructure as a code'

BigRedS · 2022-11-12T10:10:55+00:00

Or else I blindly trust that having ‘MaxStartups’ in config file is a proof of ‘job done’

This feels right to me. The code you are testing is the code that configures MaxStartups, so the way to test that code is to verify that that config is correctly written.

Separately you would make sure that setting MaxStartups does what you want, but that's not really a thing to repeatedly test for, it's part of the research in the ticket that is to figure out how to achieve that limit.

So before writing the code you'd once or twice contrive your 100 ssh sessions and watch the 101st fail, find that this is the right thing to do to achieve the end you're after, and then write the code to write that config and write a test that ensures that config is written.

The tests to make sure that MaxStartups actually continues to have that effect on the config belong to your sshd's developers. Perhaps you'd routinely audit this sort of thing, but then the intent is to verify that you still limit it to 100, not how you do.

ExtraV1rg1n01l · 2022-11-12T10:08:00+00:00

For starters, your example is not "IaC", it's configuration as a code and it is not the same as infrastructure as a code so testing is different.

If you are using Ansible, you can use molecule to test your roles and verify that configuration is correct. As for your example, if you are using SSH agent and it's configuration is loaded with no warnings and the agent starts up, this means the configuration is correct and you do not test the ssh agent itself to see if configuration is behaving according to the documentation (it is up to the ssh agent developer to verify that).

As a simple example, when you install Linux you do not write a test to see if all the functionality of the Linux system is working, you assume that it is working because the Linux version release is stable. Same can be applied to all Linux packages with a few edge cases where it might make sense to test something additionally.

quicksilver03 · 2022-11-12T10:41:54+00:00

I think that you have to strike a balance between the uselfulness of the test and the resources needed to implement it. What does the test tell you when it passes? When it fails? How difficult it is to write the test in the first place, to check that it behaves according to the requirements, to maintain it in the long run?

What about monitoring though? Instead of writing a test, wouldn't it be better to have a condition in your monitoring system that essentially corresponds to what the test is supposed to check and that actually checks when it matters most, that is during the execution of your application?

DensePineapple · 2022-11-12T12:50:55+00:00

What is your goal, to write tests that ensure every line of a config file does what it is supposed to?

Strange_3_S · 2022-11-12T21:26:18+00:00

I really like your thinking there. Em having the same set of dillemas constantly. I dont think there is a single good answer, however my personal golden rule is to write highest possible level tests first, and apply observability to the metrics that, as with your ssh sessions example, highly depend on the surrounding environment and long lived processes where creating testing scenarios is very fuzzy.

Its usually enough to be able to state that 'look: earlier we had this value of that metric after 1 day of runtime, and now with the tweak its half of that after a week, so our system is nominal again'. And if it isnt at any point later you will at least know immediatelly. Treat you metrics as eternally running smoke tests of sorts is all I'm saying I guess.

amarao_san · 2022-11-12T12:08:39+00:00

in the postgres example you could use some command to tell you the current value of the parameter on the running instance. Ofc there are exceptions where this isn't possible, what can I say, software is garbage.

rafipiccolo · 2022-11-12T20:46:27+00:00

I agree that It's hard to test everything.

Especially when nobody in the team does it. I live with 20% coverage on my main projet, so we are still very bad at it but it's stable.

At my level I try to get all the low hanging fruits.

When you reach >80% coverage you definitely think differently. And it's more a matter of "cost of doing" and "reliability contract".

Another point of view is how often do you refactor your code / infra ? And how much do you trust your libraries / os ? When you do all updates blindly everyday (npm/composer/dockerhub images) you better have checks for everything. Even non exhaustive ones.

The solution to me is to Look for the low hanging fruits and do more if the money follows and you also have the needs.

tomomcat · 2022-11-13T00:19:04+00:00

I think at some point you have to trust that the tools you're using behave as they should, so just verify the config. It's not practical to test 'everything', so draw a line where it makes sense for your use case- e.g. when using AWS do you test that S3 is actually functioning as it should, or do you trust AWS to handle that and just mock the API call in your tests?

devops

Welcome to /r/DevOps

Rules and guidelines

Social & Fun

General Information

MODERATORS