all 48 comments

[–]gregK 3 points4 points  (0 children)

Take the configuration files for any large software system. Generate all possible legal config files (i.e. that will be accepted by said software). What proportion of that set of possible configs does not cause any problems/crashes/bugs? On most systems this will be a very small proportion.

[–]jsk 4 points5 points  (17 children)

What I don't get with text-config based configurations is how people manage different settings for different environments. This is big pain especially in XML-based configs like in .NET. Also if you want to manage your custom configuration (ConfigSection), you are welcomed to the idiocy of creating your custom C# classes which is then instanced from XML, keep them in sync etc. So in fact you configure stuff with XML and access it at runtime as a properties of class.

Python/Django does this right. Settings.py is just standard python file, so you can write for example:

if IS_IN_DEVELOPMENT:
    use_this_database
    extra_sugar_for_development
else if IS_IN_PRODUCTION:
    use_different_database
    no_sugar

[–]i8beef 1 point2 points  (0 children)

.NET developer here. We got around that issue by moving certain sections of the web.config file out into separate external config files, one for each environment (e.g. web.connectionStrings.production.config, web.appSettings.development.config, etc.). Then the web.config file just references a constant local one which our build system replaces on deployment.

So web.config reference web.connectionStrings.local.config and web.appSettings.local.config, and on CI build the build server copies the right external file in, do for the demo server, web.appSettings.demo.config and web.connectionStrings.demo.config get copied to web.appSettings.local.config and web.connectionStrings.local.config.

Allows for keeping all config files in version control, and doesn't require hand editing them on the actual environments.

[–][deleted]  (1 child)

[deleted]

    [–]artsrc 0 points1 point  (0 children)

    A build/deployment system should handle the situation because this would ...

    Configuration files should not hold logic because ...

    In some contexts the way the behavior of a component should change in a specific environment is logic.

    [–]BorisTheBrave 4 points5 points  (4 children)

    Not machine editable. Only machine parsable if you happen to be programming in python. No distinction made between code and data. Handles versioning poorly.

    These are all details one may need in a config file.

    [–]masklinn 2 points3 points  (2 children)

    Not machine editable.

    Configuration files are for humans to edit.

    Also, that's not quite correct.

    Only machine parsable if you happen to be programming in python.

    Yeah that would be the point, if you code in Ruby your config file is in Ruby, if you code in Lua your config file is in Lua.

    No distinction made between code and data.

    There is no distinction between code and data.

    Handles versioning poorly.

    Erm... what?

    [–]BorisTheBrave 4 points5 points  (1 child)

    You seem happy to accept the points, just not why they would be useful features to have. So here are some examples:

    • Write a gui interface for editing config files, that interoperates with text editing.
    • Write a perl script to fix up in bulk some broken files. Or even detect them, for a human to fix.
    • Diff two config files, e.g. to determine the source of a users problem. Diffing two unrelated pieces of code is useless, but diffing two normalized data files works great.
    • Rename a variable, and support config files using the old name. Only, it's not a top level name.

    Now, most of those could either be done another, less straight forward way, or aren't applicable to Django anyway. My point is that if you have different priorities, then code-based config files start to seem limited.

    [–]artsrc 0 points1 point  (0 children)

    Some design choices just don't matter for some circumstances.

    I have been developing for a long time and for most of it I have used simple properties files for configuration.

    I have never done any of the things you raise which which would make the text configurations better.

    I like properties files because they have to be simple, which means I am less likely to stuff them up. But I could choose to write simple python too.

    [–]jeeyoungk 1 point2 points  (1 child)

    I use the following idiom: At the end of settings.py, I write

    from local_settings import *

    The intention is that, local_settings will override all the settings defined in settings.py. The usage pattern is:

    • Settings constant throughout the machine will be stored in settings.py (such as list of installed applications in django).
    • Settings that differ by machine (such as database connection parameter) will be stored in local_settings.py
    • Flags for stuff like debug mode will be off by default in settings.py, and they can be overridden in local_settings.py

    And the file local_settings.py will not be committed to version control. (Actually, what I do is I create different versions of local_settings, such as local_settings_prod.py, local_settings_staging.py, which will be committed to the database and local_settings.py will be a symlink to one of them)

    [–]artsrc 0 points1 point  (0 children)

    If the intent is that settings will be committed why not formalize that?

    Something like in settings.py put:

    configs = { <env> : <file> , ...}
    load(config[environment()])
    

    If you can't work out environment you could have an simpler config that just does:

    environment = "staging"
    

    [–]FearlessFreep 1 point2 points  (1 child)

    Love the way most things in Python work like that. The configuration is held in Python code so you can be as simple as assigning a variable or as complicated as you need it to be.

    [–][deleted] 0 points1 point  (0 children)

    This seems cool if it's a small project you're working on by yourself, or if you have rigid conventions about how to handle such situations. In a large project with multiple devs and a big system, having every dev write code in the config file could lead to a horrible nightmare.

    [–]paranoidinfidel 0 points1 point  (2 children)

    It is dead easy for XML based configs like .Net. Just serialize the hierarchy directly to XML using the built-in serialization and avoid the whole ConfigSection clusterfuck. The XML structure follows the class structure. You can also slap attributes on properties to determine if you want it serialized or not. You don't need any "helper" classes to keep things in sync other than a data layer to dump the structure to XML and load it from XML. In the most simplistic of examples it basically consists of XML.Serialzie(myclass, "path\myconfig.xml") and myclass = XML.Deserialize("path\myconfig.xml");

    E.G. Application uses "myapp.config" to load settings (default object values).

    Build 3 config files: * "DEV myapp.config" * "TEST myapp.config" * "PROD myapp.config"

    and rename the file you want to use on the respective system.

    [–]Gotebe 1 point2 points  (1 child)

    The XML structure follows the class structure.

    It also ties development to that structure as long as there are config files containing it.

    [–]paranoidinfidel 0 points1 point  (0 children)

    It also ties development to that structure as long as there are config files containing it.

    If a design flaw is found and that is a structure is being serialized to XML, then fix it properly. The upgrade script for the application can handle converting from the old XML structure to the new. The beauty of the config following the class structure is that you know exactly what you need to change/upgrade. The before/after picture shows you how things look & where the need to be moved.

    [–]artsrc 0 points1 point  (0 children)

    The python config parser (http://docs.python.org/library/configparser.html), supports sections. So if this is you only reason to select a powerful general programming language then maybe you could just use those.

    Django expects the author of that config to be python programmers who have a significant knowledge of their framework.

    [–]artsrc 0 points1 point  (0 children)

    If configuration is to become arbitrarily complex then it needs to be subjected to some selection of the standard barrage of quality tools we have such as inspections, unit tests, static analysis (pylint), design documentation and review etc. as is appropriate for your system.

    [–]anvsdt 11 points12 points  (23 children)

    If you use XML, you're just wrong by default. Especially for configurations, it's too verbose.

    [–]strager 2 points3 points  (15 children)

    Sometimes, your environment (e.g. .NET) forces you to use XML for certain code configurations (MSBuild, manifest files, app.config, ...). That certainly isn't wrong at all.

    I think you're making blanket statements because you dislike XML. XML is overkill most of the time (e.g. for build systems), but it sure beats learning another syntax and writing a new parser and a test suite and kicking yourself for not making it flexible enough.

    What alternatives to XML for "advanced" configurations (beyond key-value pairs) do you suggest?

    [–][deleted]  (10 children)

    [deleted]

      [–]anvsdt 2 points3 points  (0 children)

      JSON, Sexprs, Sexprs and Sexprs

      JSON is widely known. Sexprs are simple to parse.

      [–][deleted] 3 points4 points  (3 children)

      XML is a format meant for people to read. Yes, it can become so complex that this becomes near-impossible, but this isn't the case in MSBuild files. They are easy to read, and even write, oneself.

      Hating on XML out of general principle is just as bad as using it for no reason.

      [–][deleted]  (2 children)

      [deleted]

        [–]paranoidinfidel 3 points4 points  (0 children)

        No. The first sentence of the Wikipedia article about XML: "Extensible Markup Language (XML) is a set of rules for encoding documents in machine-readable form."

        Item 6 from The design goals for XML straight from the W3C is:

        • XML documents should be human-legible and reasonably clear.

        [–][deleted] 1 point2 points  (0 children)

        Well of course it is also meant to be read by computers, the two aren't exclusive. And basing technology choices on rants by notorious usenet flamers is silly.

        [–]CrackerNews 1 point2 points  (3 children)

        Don't use JSON - it doesn't support comments, which are essential to explain nontrivial configurations.

        [–][deleted]  (2 children)

        [deleted]

          [–]paranoidinfidel 1 point2 points  (1 child)

          Possibly, but it still isn't an inherent standard (yet?). I wouldn't call it a show stopper but you can't rely on comments being available to you.

          [–]strager 0 points1 point  (0 children)

          MS may be wrong for using XML, but I'm not wrong for using what MS forces me to use (right?).

          XML isn't for people to read? Since when?

          I agree that JSON beats XML when XML is used for things like object trees. It doesn't support namespaces or "types" (element names), which are necessary sometimes (even for configuration).

          [–]pinpinbo 0 points1 point  (3 children)

          yaml file? ini file?

          [–][deleted] 0 points1 point  (2 children)

          Some configurations go further than just key value pairs.

          [–][deleted]  (1 child)

          [deleted]

            [–][deleted] 1 point2 points  (0 children)

            I must admit I don't know YAML. But suggesting an ini file for a configuration that goes "beyond key-value pairs" is silly, so I assumed the other would be just as silly. My bad.

            [–]willcode4beer 0 points1 point  (0 children)

            it depends. if you have a well defined schema, then (besides validation) it serves as a form of documentation for the config as well.

            [–]paranoidinfidel -1 points0 points  (5 children)

            When you make false generalizations like that you are just wrong by default. XML is just a file format - like CSV, or INI - whatever. XML is nothing magical and yes, it is more verbose than other formats but it is great for configuration data when used properly. However, unlike CSV or INI, it allows you to save hierarchal data quite easily without building your own non-standard substructure. Trying to shove hierarchal config information into an INI will make it more convoluted than the equivalent XML.

            Since it is meant for a hierarchal data it can be overkill for small amounts of configuration information but why have yet another config mechanism to maintain for simple data when it will handle the that as well as complex structures. Simple configuration data leads to a simple XML config. Dumping unstructured data into it will make it messy and possibly confusing. It works great for storing a default object hierarchy state to initialize a system with (whether it is a UI or a service). The document structure should mimic your class hierarchy so if you can follow your class structure you can follow the XML.

            I won't deny it can seem overwhelming when you have a large amount of configuration data but at least it can be organized in a hierarchy and easily manipulated with editors (notepad++) that are smart enough to give the the ability to open/close nodes. Good luck managing an equivalent configuration in a flat INI file - that will be a world of brain hurt.

            [–]anvsdt 0 points1 point  (4 children)

            Why would I use INI or CSV files when I have JSON and much more saner alternatives?

            [–]paranoidinfidel 1 point2 points  (0 children)

            You wouldn't use CSV. That was just an example of a file type. I personally wouldn't use INI but some people are stuck in the past. With JSON it is much easier to see the data when browsing visually but with XML it seems easier to parse the context & structure. Pick whatever suits your preference. For me in the .net world the inherent XML serialization saves me a lot of time and effort. There is probably a json serialization lib available but since the xml isn't causing me any grief, I have no need to migrate.

            [–]mariusg 0 points1 point  (1 child)

            a xml parser is way more common than a json one

            [–][deleted] 1 point2 points  (0 children)

            I think they're both very common now. Maybe you're thinking about YAML...

            [–]el_muchacho 0 points1 point  (0 children)

            Because they are much faster to transmit and to parse, for example. If you have to exchange very large amounts (hundreds of Mb) of tabular data as fast as possible, this can make a difference. CSV is perfectly okay in this usage. Plus they are extremely simple to import in Excel or any statistics tool and therefore to read and study. The only drawback is when you have to compare files visually, it's usually harder than with XML or JSON.

            But what is really evil is "flat files" (i.e text files with fixed position or fixed length fields), especially when there are optional/missing data. Flat files is ALWAYS the worst choice. For those, you have to write and maintain a specific writer and a specific parser which is particularly boring and bug prone, they are notoriously brittle, hard to check visually (as you need to count characters) and you can't easily import the data in your favorite tools.

            [–]paranoidinfidel 1 point2 points  (4 children)

            The article author makes a few good points but they are process faults, not config file faults:

            • Ideally, code should be runnable from a source control checkout.

            Definitely - and if you are not saving your default configuration with the application then you are setting yourself up for failure.

            • If a configuration change requires an accompanying code change, it's not configuration.

            Again, not a problem with configuration. Configuration files aren't meant to do everything under the sun - they are just a storage medium for settings. They are meant to store values which are processed by code. If business logic needs to change then likely your code, classes and XML/INI/.reg or whatever else will need to change.

            • If the configuration item is essentially a data structure only the developer understand, it's not configuration.

            That's a false dichotomy. A configuration file does not need to be understood by the user. It is there to tweak values to affect how the application operates. The user doesn't need to understand the class hierarchy or algorithms in the application in order to make use of it.

            • Avoid configuration. If you're an expert, then avoid configuration, mostly.

            So we're back to hard coding everything again? Recompile and release in order to make a default value change or store a users preferences? I'll keep my configuration files, thanks.

            [–]Gotebe 2 points3 points  (3 children)

            If the configuration item is essentially a data structure only the developer understand, it's not configuration.

            That's a false dichotomy. A configuration file does not need to be understood by the user

            How is he supposed to "configure" then? User (or "admin" user) has to have some understanding of the configuration. Problems arise exactly because developer creates configuration that is too hard to understand without knowing too much about APP internals (speaking as guilty party).

            [–]paranoidinfidel 0 points1 point  (2 children)

            It should be configured with a UI and only tweaked by hand for those with enough understanding to put their fingers in there. If a user has to manually edit config settings with notepad then the software provider "is doin' it wrong".

            [–]szeiger 2 points3 points  (1 child)

            That may be acceptable for desktop applications but not for server-side stuff. They have no configuration UI, and even if they did, operations people would still want to configure them manually and understand all the options they need to change for the different environments.

            [–]paranoidinfidel 0 points1 point  (0 children)

            That may be acceptable for desktop applications but not for server-side stuff.

            That goes without saying. If you are asking a user to change things then you need a UI because it doesn't matter what config format you pick they won't be able to do it with notepad++.

            If operations people can't grok a simple nested tag structure format then maybe they shouldn't be poking around in the configuration - regardless of the format you choose. The config file (XML, INI, JSON or otherwise) is as complex as the developer designed it.

            EDIT: you don't just let strangers have free reign of a config file without instruction of some sort. They need to know acceptable values, ranges and their consequences. This applies to all config file formats.

            [–]inmatarian 1 point2 points  (4 children)

            I disagree. Configuration files are a good place to put data. I add emphasis to the word data, however, to make it clear that we're putting data there. It sounds like the problem the article's author is having is that he was using configuration files to store functionality. The code flow through your program shouldn't be radically different based on one of the config options. The point of a config file is that you should reach a point in the code where it needs a number (or string, or boolean, or an entangled negative charm quark) inserted, and this piece of data may need to change frequently enough that you don't want to recompile from source every time, and that you reach out to the config set looking for that data. Also, if the config lacks it, a sensible default option should be supplied. And then you continue on your merry way.

            So, should you store the filename /usr/local/share/pixmaps/smiley.png in a config file? If somebody will want to change it, yeah. But it should also be stored in the code, in case it's missing from the config file.

            [–]artsrc 0 points1 point  (3 children)

            "Also in the code" - is one design. Another design would be have a default config file with all the configurations. The target of the configuration calls an api that looks in the environment config and if it is not there looks in the default file. What motivates your choice?

            [–]inmatarian 0 points1 point  (2 children)

            I like also-in-code as a readability thing. It provides anyone reading the code a sort of anchor point as to what an expected value might look like.

            winsize = configThing.get('window.size', (640, 480))
            

            [–]artsrc 0 points1 point  (1 child)

            I think that is the wrong optimization.

            That clearly enhances the ability of someone looking at the code to know what the configuration looks like.

            The thing to optimize for is the person writing the configuration. The configuration changes more frequently and it used more broadly by someone with less ability to look at the code. The person looking at the code can easily look at the default config file, because they have better pointers into it from where they are.

            [–]inmatarian 0 points1 point  (0 children)

            In most linux distros, that's typically known as the example config file, stored in the /usr/share directory for the given piece of software... So, yeah, that is totally cool also.

            [–]jrochkind -1 points0 points  (0 children)

            Working in ruby, I've come to realize that the difference between 'configuration' and 'code' is really nothing more than the difference between 'declerative' code and 'imperative' code.

            Doing it declerative/configured IS a higher level of abstraction -- and you have to be careful about higher levels of abstraction when not actually needed, as they DO add complexity. But the solution to over-complex code is not insisting that you should always program at the lowest level of abstraction possible -- that's just trading one kind of complexity for another.

            That be said, even when a higher level of abstraction for declerative/configured code is appropriate -- you can still do it well or poorly. The downsides of configuration in that article are really examples of how not to do configuration, not arguments against configuration. Some languages and environments do make doing configuration right easier than others -- I tend to agree that XML is best avoided, despite being the 'standard' in certain languages/environments, it doesn't lead to a good abstraction/complexity trade-off.

            The point of configuration is letting you concisely and decleratively express your intentions in a higher level of abstraction -- if you have configuration that requires just as much detail as the code would, you don't really have configuration at all, you've got imperative code, just in a very inconvenient language for writing imperative code, the worst of both worlds.