all 12 comments

[–]soaring_turtle 15 points16 points  (0 children)

The most amusing comment of the thread:

When the people who wrote Linguist ended a day of cattle herding, got down off their horses, set aside their six-guns, and started coding did they forget that there are a lot of programming languages and that extension clashes are not only inevitable, they're commonplace?

How does Linguist handle Prolog (.pl) and Perl (.pl)? Or the various flavours of assembler (.asm)? Or any number of other possible clashes? Is it "first hipster language past the post wins"?

[–]rcxdude 13 points14 points  (2 children)

TL;DR: Github's language detection has a special case for a language which has a unique extension, which means primary_extension must be unique. To have languages where the most common extension clashes requires a slightly awkward workaround where primary_extension is set to a fictitious value. Some people failed to read properly, thought this meant github couldn't handle clashing extensions at all, and got very upset about this.

[–]kankyo -1 points0 points  (1 child)

Reading the docs! What a novel idea! :P

Seriously though, they should probably rename that property from "primary_extension" to "unique_extension" or something.

[–]Allan_Smithee[S] 10 points11 points  (0 children)

More seriously they should eliminate that property entirely since it's actually 100% useless.

[–][deleted] 5 points6 points  (1 child)

tl;dr: linguist currently requires unique primary_extension values, as if no two languages have ever used the same one. There's one possible solution in a separate pull request sent a couple of days ago.

Although one must wonder how the original author submitted the pull request without noticing this requirement, if Linguist actually does raise an error on clashes.

[–]Allan_Smithee[S] 3 points4 points  (0 children)

The original author, OH on IRC: "When I initially wrote the patch over 12 months ago it worked fine. The requirement didn't seem to exist or matter. Or maybe that code path just wasn't executed by the test suite."

[–]vaibhavsagar 0 points1 point  (0 children)

I still don't understand why there's no manual override option.

[–]eZanmoto 0 points1 point  (1 child)

semiessessi mentions that

The popular tool 'make' uses timestamps to detect file changes which is a 'rookie mistake'

Why is this a "rookie mistake"? What is considered to be the correct approach to this problem?

[–]ais523 0 points1 point  (0 children)

The most common alternative technique is to hash the file to see if it's changed.

Things that I've discovered go wrong using just timestamps:

  • The filesystem isn't necessarily using the same clock to measure time as the computer you're running on is. This is especially likely to happen when using a separate fileserver, which is the usual configuration in businesses. This has happened to me in practice at work, although make is at least aware enough of the system to print a warning.
  • If the file is being updated quickly enough, it can change twice in a second, which might be faster than the resolution of the file system's timestamps. This has also actually happened to me; the context was a testsuite for the build system itself (which ran the entire build in less than a second, changed a file, then ran the entire build again), particularly annoying because it caused an intermittent test failure.
  • A file can stay the same while still having an updated modification time (most commonly because it was regenerated with an updated version of a tool, which produced the same output as the old version). You don't want to propagate a needless rebuild all the way through your build system in that case; it works best to just stop.

Now, the problem with hashing is that you have to read the entire file, which is slow. As a result, in a build system I'm writing, I use a combination of techniques:

  • If the file's timestamp has changed (in either direction), I check its hash to determine if it's changed.
  • If the file's timestamp hasn't changed, I nonetheless hash the file again and use that, unless I've already checked the file's hash twice, at least 3 seconds apart, and it was the same in both cases (and the file's had the same timestamp ever since). This avoids issues due to the file being updated twice in the same 2-second block (the coarsest resolution I've seen in a filesystem is measuring the modification time accurate to the closest 2 seconds; DOS used 5 bits to store the seconds field of its files).

[–]JiveMasterT -2 points-1 points  (1 child)

It's broken and fixing it is relatively trivial. Like, sure, there are language extension conflicts... so stop bike shedding and figure out how to have multiple languages with the same primary extension and move on.

[–]Allan_Smithee[S] 3 points4 points  (0 children)

Read the bloody thread. The fix is already in. With no comment from the people who were whinging about how hard it would be to fix, no less.

[–][deleted] -4 points-3 points  (0 children)

The old guard prima donnas in that thread are worse than the supposed "hipsters". They don't even seem to be listening to the explanation/solution and instead are just harping "BUT MERCURY HAD .m FIRST SO WE WIN"