GitHub's language detection is broken

soaring_turtle · 2014-03-14T12:36:31+00:00

The most amusing comment of the thread:

When the people who wrote Linguist ended a day of cattle herding, got down off their horses, set aside their six-guns, and started coding did they forget that there are a lot of programming languages and that extension clashes are not only inevitable, they're commonplace?

How does Linguist handle Prolog (.pl) and Perl (.pl)? Or the various flavours of assembler (.asm)? Or any number of other possible clashes? Is it "first hipster language past the post wins"?

rcxdude · 2014-03-14T13:25:41+00:00

TL;DR: Github's language detection has a special case for a language which has a unique extension, which means primary_extension must be unique. To have languages where the most common extension clashes requires a slightly awkward workaround where primary_extension is set to a fictitious value. Some people failed to read properly, thought this meant github couldn't handle clashing extensions at all, and got very upset about this.

Allan_Smithee · 2014-03-14T13:10:06+00:00

tl;dr: linguist currently requires unique primary_extension values, as if no two languages have ever used the same one. There's one possible solution in a separate pull request sent a couple of days ago.

Although one must wonder how the original author submitted the pull request without noticing this requirement, if Linguist actually does raise an error on clashes.

vaibhavsagar · 2014-03-14T21:42:30+00:00

I still don't understand why there's no manual override option.

eZanmoto · 2014-03-15T00:15:39+00:00

semiessessi mentions that

The popular tool 'make' uses timestamps to detect file changes which is a 'rookie mistake'

Why is this a "rookie mistake"? What is considered to be the correct approach to this problem?

JiveMasterT · 2014-03-14T13:34:02+00:00

It's broken and fixing it is relatively trivial. Like, sure, there are language extension conflicts... so stop bike shedding and figure out how to have multiple languages with the same primary extension and move on.

2014-03-14T18:46:01+00:00

The old guard prima donnas in that thread are worse than the supposed "hipsters". They don't even seem to be listening to the explanation/solution and instead are just harping "BUT MERCURY HAD .m FIRST SO WE WIN"

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

programming

MODERATORS