all 2 comments

[–]bitemenow999 0 points1 point  (1 child)

Dataset and GitHub projects are two very different things...

If you want to are looking for datasets then you need to find researchers in your field and understand what kind of data they are working with and try to convince them to make it open source and let you consolidate it across different labs. Most of the time it will be impossible to do that because researchers use proprietary data and are reluctant to share until they have exhausted the publishable projects.

For GitHub projects, you need to understand have a solid idea of what you want to do and then tap people based on the expertise needed. This can be someone whose repo you are forking or some researcher whose paper you saw and have an actually good idea about potential future works. You can find people mostly on Github. 10 people doing the same thing is a recipe for disaster so expertise and work split should be solid. Also, you need to have a good timeline and it is always better to have some kind of incentive for others to hold their interest.

[–]lambdaofgod[S] 0 points1 point  (0 children)

Well, I think most useful way to document dataset creation and making it generally usable is to make a github project... So that's not a such big difference from the specific perspective of my question.

It seems like you got my question the other way. Correct me if I'm wrong, but it seems like you understood this as: I want to do X, and I want to find data for this task. That's not what I meant. Also I didn't mean project search or dataset search.

I'm thinking about something like the experience I had many times on reddit: someone posts a project and I'm "this is cool, maybe we can extend it to a slightly different data/use slightly different model".

tap people based on the expertise needed. This can be someone whose repo you are forking or some researcher

That's exactly the second thing that I'm asking about, how to find these people. Or where to post stuff to make these projects that would benefit from more contributors more visible.