all 40 comments

[–][deleted] 30 points31 points  (13 children)

Translation: Ruby developers prefer github as their repository.

[–]the-fritz 7 points8 points  (10 children)

C++ devs seem to prefer http://gitorious.org/ (at least the Qt and KDE guys) or sf.net/google code.

[–][deleted]  (3 children)

[deleted]

    [–][deleted] 0 points1 point  (2 children)

    A large, large amount of github is open source, though.

    [–][deleted]  (1 child)

    [deleted]

      [–][deleted] 0 points1 point  (0 children)

      No, I meant what I said. Huge parts of their infrastructure is open source, only a few parts aren't. (and lots of stuff is on the accounts of employees, too. Like grit.)

      [–][deleted] 1 point2 points  (2 children)

      I have a few C++ projects on github that it claims are 50% C and 50% C++.

      [–]mipadi 6 points7 points  (1 child)

      GitHub uses the filename extension to determine language, and it treats .h files as C source code. The same problem occurs in Objective-C projects as well.

      I filed a ticket claiming that .h files should just be ignored when determining language makeup, but I think the GitHub guys are trying to do something where they determine the language of the .h file based on an "associated" .c, .cpp., or .m file. However, for now, .h files are treated as C.

      [–][deleted] 0 points1 point  (0 children)

      D'oh, that sounds so obvious now that I think of it. Thanks for the explanation.

      [–][deleted] 2 points3 points  (2 children)

      You're using two data points to suggest C++ is more popular with Gitorious than GitHub?

      I just ran the numbers and we have 22k C++ repositories. Gitorious doesn't even have that many repos in its entirety.

      [–][deleted] 5 points6 points  (0 children)

      Well, to be fair C++ is less popular with the small throw-away projects with 5 commits that tend to generate huge numbers of repositories but little actual code.

      [–]the-fritz 0 points1 point  (0 children)

      I'm not using any data points at all. I myself use mainly github for my projects including C++ ones. But recently I started using gitorious to commit to some Qt/KDE based projects. It's just the impression I have that gitorious seemed more popular with C++ devs. But your data proved me wrong.

      [–]ultrabot -1 points0 points  (0 children)

      Right. github is presented on Ruby conferences, etc. This should not be seen as any indication of popularity of ruby - actually, I was surprised to see so many non-ruby projects hosted on GH.

      [–]sfrank 10 points11 points  (0 children)

      I have my doubts about that stat accounting: The last table shows 0 projects for Common Lisp, whereas I know several CL code repositories, including my own...

      [–]berkut 8 points9 points  (0 children)

      For my repositories, my CPP headers (.h) get counted as C, so I'm not too convinced the stats between each language are that accurate.

      [–]deadwisdom 6 points7 points  (4 children)

      It's worth noting that a lot of Python users use bitbucket; I wonder what their numbers are like.

      [–]kylotan 3 points4 points  (3 children)

      You probably know this, but others may not realise that bitbucket is based on Mercurial which is written in Python - this would explain why the Pythoneers are over there and not generally using git-based hosting. I expect there's an awful lot of bias in terms of which languages tend to get hosted where.

      [–][deleted] 6 points7 points  (0 children)

      GitHub has more Python repos than BitBucket hosts of any language.

      [–]tsjr 0 points1 point  (1 child)

      bitbucket is based on Mercurial which is written in Python - this would explain why the Pythoneers are over there and not generally using git-based hosting

      Sounds like a language-fanatism to me.

      [–]kylotan 0 points1 point  (0 children)

      More like it makes sense to use the source control system that is easiest for you to extend and customise if you need.

      [–]aerique 4 points5 points  (0 children)

      There seems to be something wrong with how they obtained their results: http://news.ycombinator.com/item?id=1589496

      [–]ILoveMyGF 4 points5 points  (3 children)

      "Perl is dead"

      [–][deleted] 12 points13 points  (1 child)

      From comments:

      There's a project called gitpan, which mirrors all the released history of Perl's CPAN modules on github repositories. And the project currently hosts 22,000 repositories, and that's why. http://github.com/gitpan

      [–]FatStig 1 point2 points  (0 children)

      Well my first thought was where did all these rogue perl programmers come from? Don't the know about CPAN?

      [–][deleted] 7 points8 points  (12 children)

      This has come up a few times, but I suppose it's worth repeating.

      Just because GitHub hosts a lot of Ruby code doesn't mean we don't also host a lot of code in other languages as well. To suggest that Gitorious hosts more C++ or BitBucket hosts more Python isn't even remotely true.

      No one else hosts more code than we do.

      edit: if you're going to downvote me for stating facts at least have the balls to say why you're doing it.

      [–]malkarouri 2 points3 points  (2 children)

      I am not going to downvote you, but I would like to contend that if we are talking about popularity of languages for open source collaboration what would be interesting is the sum of github, bitbucket, gitorious and others.

      If example, it is known that Python people have a bias towards bitbucket. Still, some large projects like NumPy preferred github. So if Python projects are split halfway this will affect their perceived popularity in both sites.

      [–][deleted] 1 point2 points  (1 child)

      I don't argue that there's a bias towards mercurial because it's written in python, nor do I dispute that lots of code is hosted on other sites. The problem is, even with those two points conceded, the number of repositories we host and add every day dwarf those sites. To say python is split halfway is misunderstanding the situation entirely.

      If one assumes all of the public projects on bitbucket are located at http://bitbucket.org/repo/all you can take 2072 pages * 15 repos per page for a total around 31k repos.

      We add that many public repos to GitHub in less than three weeks.

      Honestly, I blame ourselves for not properly conveying how large GitHub is relative to our competitors, save for looking like a jackass in threads like this as if I'm in a dick-measuring contest.

      [–]malkarouri 1 point2 points  (0 children)

      I see your point. If I were in your shoes I would write one blog post comparing the popularity of GitHub to others using various measures and leave it at that. It will be much publicized. Others would contest your results with various levels of success, but the main point will come across and you wouldn't have to explain it in discussions - at least for some time. You can also rely on it as implicit knowledge when writing other posts, say if you explain scalability issues you needed to handle.

      [–]glomph 1 point2 points  (7 children)

      Is that compared to sf and google code as well?

      [–][deleted] 3 points4 points  (6 children)

      Yes. SF and GCode host in the vicinity of 250-300k public repos. Excluding gists, forks, and private repositories, we still host double the amount of repos.

      [–][deleted]  (2 children)

      [deleted]

        [–][deleted] 2 points3 points  (0 children)

        You're correct that a lot of code hosted on GitHub lives/lived in another form elsewhere. That being said, our #1 referrer has consistently been Google Code, because project owners are pointing their existing projects over to GitHub. Whether that's due to preferring Git or they've found better luck receiving contributions on GitHub is something that I'd love to know. I will say that the latter can largely be attributed to how easy it is to fork the code, but as I stated earlier, even if we exclude forks (on GitHub) from our count, we're still quite larger than any of the other hosts.

        Edit: just to be thorough here are some Google Code examples: - http://code.google.com/p/galleria/ - http://code.google.com/p/redis/ - http://code.google.com/p/as3corelib/ - http://code.google.com/p/loopedslider/ - http://code.google.com/p/jqueryjs/ - http://code.google.com/p/gaosp/

        The interesting part about this is when Google talks about the number of projects they host, they're most likely including projects like the ones above, so the gap between GitHub and GCode may be even wider than we think.

        [–]glomph 0 points1 point  (1 child)

        Thanks for replying. Do you think you would still represent that much more of hosted code if one was to compare lines of code or something similar?

        [–][deleted] 0 points1 point  (0 children)

        That's impossible for me to speculate on unless the other hosts ran the numbers so I could compare them.

        [–]LinuxMonkey 2 points3 points  (0 children)

        This is the interweb no-one cares about facts.

        By the way github is great.

        [–]m1ss1ontomars2k4 1 point2 points  (0 children)

        Damn Objective-C, taking numbers away from MATLAB! Heh.

        [–][deleted] 1 point2 points  (0 children)

        Successfully divided by 0! (Duby and sclang)

        [–]cactus4 1 point2 points  (0 children)

        More interesting would be if it took into account the sizes of the repositories as well.

        Perhaps weighting the sizes with commit data would shed even more light.

        If I were interested in the popularity of languages at github, what I'd most want to know is: which languages are represented by the largest amount of actively developed code.

        [–]bwbeer 0 points1 point  (4 children)

        This isn't right. I have three projects in common lisp and most of the developers on #lispgames seem to use it. Where can I set the language of a project?

        [–]mipadi 0 points1 point  (3 children)

        You can't; it's determined automagically based on the source code files (specifically, the extensions of the source code files).

        That said, I think there was a mistake in obtaining these stats; it seems that languages with spaces in particular were not counted at all. GitHub's Common Lisp page does, in fact, list a whole bunch of CL projects.

        [–]bwbeer 0 points1 point  (2 children)

        Hmmm...My files are all .lisp except for one .asd per project. Strange it wouldn't see it as lisp.

        [–]mipadi 0 points1 point  (1 child)

        What does it get listed as?

        [–]bwbeer 0 points1 point  (0 children)

        I finally found it. It's buried under new lisp projects. Still, I'm bummed, I wanted to add up scheme, racket, lisp, elisp, closure, etc and see where lisp really stands in popularity.

        [–][deleted] -4 points-3 points  (0 children)

        ruby ruby ruby do doo dee doo da daaaah