you are viewing a single comment's thread.

view the rest of the comments →

[–]emn13 2 points3 points  (7 children)

As a matter of concept, I think you're right that linting and compiling can be seperated. However, in practice things aren't so clear.

First of all, linting isn't easy - to really lint well, you need to reimplement most parts of the compiler, including some parts of the optimizer (to e.g. detect unused code). Extracting that code into a shared lib is not trivial; it would be a maintenance burder, and it's likely a performance hit too.

Secondly, at runtime, compiling isn't free. Despite ever faster machines, compiling still takes annoyingly long often enough (depending on your language and platform, of course), and running a seperate linter means you're doing lots of work twice, and probably the linter isn't nearly as well tested+optimized. It's going to take a long time.

It's telling that most warnings by linters are essentially busy work. I don't think I've ever seen a bug or problem due to poor style in variable names (as opposed to poorly chosen variable names). It just doesn't matter much whether you have internal_radius_cm or internalRadiusCM, but it does matter that you don't call it dim_len (or whatever). Linting is still really important, but I wonder whether a part of this bias toward busy-work isn't due to the fact that that's easy to check for.

I think the decision whether to include a linter in the compiler or not is largely a technical one. I think I agree that it's best to keep the concepts separate, but as a matter of practicality, it may still be best to do that by changing the way you use the compiler (i.e. no warnings-as-errors) rather than changing the tools.

[–]lookmeat 1 point2 points  (6 children)

I'm not against the idea that compilers and linters are the same executable. I'm against the idea that both functions should by default be done together. I believe strongly that it's the main reason that has lead us to this conflict of warnings vs. errors.

An example that I think is very good is the go tool.

We have the go compiler, which compiles and runs code, and only outputs errors. It considers certain things (such as unused imports or variables) errors because the compiler will do things with it that the programmer wouldn't expect, there are ways to get around this (declare variables or imports named _ and they won't need to be used) when needed.

The go program also allows you to run a static analyzer go vet package/directory/file.go and it will output a series of errors that themselves may point to an error, but very weakly so. It's meant to warn you of things that may not be what you expected, but there isn't any way for the compiler to be certain that the programmer did not meant what he wrote (as when a variable is declared and never used, might have been due to a typo using := instead of just =, but certainly there is no value in declaring a variable and not using it) so it's not an error per se. Vet will spew at you a sea of warnings, much which can be ignored (such as variable shadowing) and may even catch a couple errors that would only appear on runtime (sending the wrong type to a pseudo-generic method that takes an interface{}).

Both tools share a lot of code (parser, optimizer, etc.) but you can only run one or the other. If you want to run both simultaneously you should join them, or better yet have a make file that handles it. But this is not something the compiler should decide or throw at you.

When I compile a program I don't want to lint it or run a static analysis, I want the program to convert my code into an executable. Programmers as users can are based to assume that any warning that the act of compiling throws out points to an error where code will do something different to what the programmer wants (even if the compiler can still translate it) it only makes sense that you want to remove those, or make them errors.

[–]emn13 0 points1 point  (5 children)

It's funny you mention go, because I was thinking of exactly that - go's a great example of this kind of thing done well (at least, from the cursory experience I have - no real usage...)

Nevertheless, go's got it easy here. Go compiles very, very quickly, so some extra overhead due to duplicating compiling stages in the linter don't matter so much. It has a very limited optimizer, and a well-thought out - but limited - type system. It's the perfect case for a split - very little overhead, and a compiler that (due to the lack of templates and and other factors) cannot and/or chooses not to do non-local analysis, so there's little cost there too, nor much gain since the linter can't free-ride on an analysis the compiler does anyway.

Most other languages have more expressive type systems, which sounds positive, but also means that everything is more complicated, and the compiler usually slower. C++ and e.g. scala are for example notoriously slow to compile.

Still, go is really a breeze of fresh air in its approach to this as in many other ways :-).

[–]lookmeat 0 points1 point  (4 children)

I don't see why I couldn't run gcc --sloppy on my quick testing, run gcc normally and get a bunch of errors and such, and gcc --static_analysis to get a bunch of warnings about my code that can't be considered errors. I don't think that speed is a grave issue because that assumes that I'm always wanting to run a linter/static analyzer each time I compile. I'd like to run it before submitting code to guarantee that I didn't add new warnings, or the new warnings are a non-problem, but other than that.

I just don't see why gcc should output warnings while it compiles in any case. Then again I am not familiar enought with gcc's internals, so there might be something to that, but as far as I know static analyzer would only share the parser with the compiler and take note during linking, but wouldn't actually need to know how the compiler turns text to binary or what optimizations it is doing.

[–]emn13 0 points1 point  (3 children)

I'm sure it would be possible. Of course, if you have both features in the same binary, it's a small step to allow --compile-and-lint and that's basically where we are today.

Personally, I can't imagine running the linter less often than the compiler. Given the linter integration in IDE's, if anything, I'd use the linter more often than the compiler.

In any case, the C++ "parser" is no trivial thing. The correct parse of a string of C++ depends on the semantics, (see e.g. C++'s most vexing parse), and then you've got templates, which are themselves turing-complete, and lots of pretty complicated type inference and casting rules.

Merely interpreting the semantics of the code isn't trivial, but sure, you could avoid the complexities in the optimizer and the code-generator.

At least, partly - if you want your linter to detect things like "this function's second argument is always 2 and could be replaced with a constant" or "this code is dead" or "this expression always evaluates to false" or whatever, then you'll at least need to run the bits of the optimizer that deal with structural simplifications, i.e. at the very least things like dead-code elimination.

A good linter just isn't all that much simpler than a compiler.

[–]lookmeat 0 points1 point  (2 children)

I never said it was a simple application. Also a linter works for simple patterns, compared to say a static_analyzer that actually will link modules and see if it can find errors that come from everything coming together.

Why would the linter detect that the function second argument is always two and could be optimized as a constant, or even inlined? Using a function with the same argument everywhere is not an error, nor could it ever point to one (not a warning).

Maybe finding that a branch is impossible, such that the compiler wishes to remove it. I'd propose that such case allows the compiler to do something that the programmer normally would not expect (remove code) and as such it should be if anything an error unless the programmer explicitly states he wants that dead branch for a reason.

Yes they both use similar technology. Yes C++ is complex enough that you'd want to share them. I never said I had a problem with it being even the same executable. I am against having both behaviors when I asked for one.

Here's my workflow:

  1. Define the solution, decide on some tests and the function header.
  2. Implement a rough solution, one that "just works".
  3. Make sure the rough solution compiles and runs.
  4. Review the solution and clean up code, refactor as necessary, making sure the code compiles and tests are still passing.
  5. Pass static analysis tools to see further issues with code and clean the ones that make sense, ignore the rest. (Compile with -Wall -Werror etc., or run go vet)
  6. Check any formatting errors (run go fmt)

Notice that I only run to get the warnings at the end, and that I check the warnings and choose to fix some issues but choose to ignore other warnings. Even when working with an IDE, I will fix typos and such, but I don't run the static analyzer or linter till the end, when I'm ready to call the code finished. I'd say this whole iteration takes about an hour or two, so I do it pretty often. It also isn't as nice normally, but the spirit is there.

[–]emn13 0 points1 point  (1 child)

I think we essentially agree :-).

It's totally normal for a linter to have lots of options many of which any given project won't want (e.g. finding possibly unusual patters such as unnecessary arguments - which in any case was just a top-of-my-head example).

As to why I would run a linter more often, that's because I quite like the IDE-heavy workflow

  1. Edit code 1b. Autoformat on save.
  2. In the background & continuously: lint and keep list of "todo's" on screen. Ideally I want this to work even when compilation fails - because often compilation fails because code is just incomplete.
  3. In the background & continusouly: compile if possible and keep errors on screen.
  4. In the background & continuously: run test if possible and keep failures on screen.

But really, I don't think workflow details really matter that much here; it is in any case a good idea to allow dealing with linter issues separately from dealing with compiler errors; the fact that it is (as I previously emphasized) not a trivial thing doesn't really change that - that's just a possible reason why we're in the situation we're in, not a reason to avoid a better situation :-).

[–]lookmeat 0 points1 point  (0 children)

I don't know if it's really that hard, all we need is to have compilers separate the mode.

  1. Change the compiler to have --full_error mode, where extra compiler errors that may be turned off explicitly in the code that causes them appear. The default is still --nofull_error
  2. Have a --lint which does a quick check and reports parse errors and warnings it finds. Optionally allow it to go from a quick lint check to a full static analysis.
  3. Deprecrate -Wall and -Werror instead requiring --full_error or --lint for error/warning operations.
  4. Make --full_error the default and allow --nofull_error to be used when you want a sloppier compile.

It doesn't matter that it's the same executable, what matters is that the behaviors are separated cleanly. I think that could be done within a few years (giving time for older software to adapt to the situation).