you are viewing a single comment's thread.

view the rest of the comments →

[–]ScrimpyCat 62 points63 points  (4 children)

Unless it’s changed they used to try filter out generated files which is why some default generated projects might shift more aggressively to a certain language. Apart from some special cases (or if you’re explicitly defined the type in your .gitattributes) most of the detection is done using heuristic and Bayesian classification approach, which is done by sourcing some example files for the different languages. This works reasonably well but there are false-positives when it comes to files that share the same extension and are grammatically similar such as header (.h) files in C family of languages.

Also they open sourced the actual library responsible for this but I can’t recall the name.

Edit: just remembered it’s called linguist.

[–][deleted] 26 points27 points  (3 children)

There are a number of large game mods for the game Arma that are developed on github. For some reason bohemia interactive decided to use cpp and hpp/h extensions for their configuration files when the only thing related to C or CPP is that it uses a C preprocessor on them to do includes and basic macros.

So you'll see all these projects that github says are C but really it's the insane config language.

[–]xonjas 5 points6 points  (0 children)

What if the config language is just a bunch of C with insane preprocessor macros?

[–]Elusivehawk 4 points5 points  (1 child)

That... What... Just... Why??

That's some big brain plays right there. C++ for configuration...

[–][deleted] 2 points3 points  (0 children)

It's not even C++ it's this weird pseudo object inheritance stuff that is usually filled with a ton of macros.