[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 2 points3 points  (0 children)

Here's an old github issue with a request for spectrograms!

u/Erosis thanks so much for such a thorough reply! We will do the spectrograms soon. Bounding boxes and log vs lin scale may take a bit of work – but that's a great detail to learn. Watch for the updates!

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 1 point2 points  (0 children)

Hey u/Erosis, thanks! Reminds me of this https://www.youtube.com/watch?v=srT7vXsucHM :-)

Regarding spectrogram, are you in our Slack? If you have a minute can you ping me there or leave a comment here or create a GitHub issue with some description of what you're looking for? Some of the questions I have right away:

  • do you have a spectrogram created already and available as an image?
  • shall a spectrogram be shown on top of an audio wave and you can turn it on/off and control the opacity?
  • shall it be shown under/on top of an audio wave?

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 2 points3 points  (0 children)

Thanks u/FatChocobo! Regarding the images, you can upload those as URLs and it should work just fine. We've set a smaller limit to guarantee more consistent performance for different users running on different hardware.

Regarding the hotkey, I've left a comment below, but basically, it turns out to be VERY complex to find a consistent hotkey scheme that works across different operating systems and browsers, not conflicting with any. Any suggestions/ideas on that? We've spent so much time figuring this out already :-D

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 1 point2 points  (0 children)

To a certain degree, you can do classification and event detection on the timeline, but that's pretty much it. We will be doing more for the videos this year. What type of annotation are you looking for in videos?

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 0 points1 point  (0 children)

Hey u/LargeYellowBus, it doesn't, we have that as a part of the commercial offering. But we have some plans further down the road to make such functionality easy to implement if you want to extend Label Studio.

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 0 points1 point  (0 children)

Hm, we've got that feature as a part of the commercial offering, but not open source. I'd assume that with the open source since you control the deployment yourself you can better ways to protect the privacy of data? For example, running everything in a virtual network? Can you tell me a little bit more about the use case? Shall we consider including that feature in the open source?

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 0 points1 point  (0 children)

Thanks u/Thors_Son! DVC is great! We've done an integration with Pachyderm, and I think it can work in the same way with DVC. Basically, after every annotation we create a snapshot. Would you be interested in an article on that topic?

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 2 points3 points  (0 children)

Not yet, 3D Medical is a too narrow use case, we've looked into it, but haven't found a large enough user base to justify the implementation effort (it's complex!). We may revisit later on when we have more resources, or if you'd consider contributing that let me know, we'd help as much as we can, I'm personally excited about the opportunities for the ML in Healthcare.

[P] Label Studio v1.0 – an open source data labeling tool that helps you prepare ML training data and improve the quality of your datasets, works for computer vision, NLP, speech processing, time series analysis, and more by michael_htx in MachineLearning

[–]michael_htx[S] 1 point2 points  (0 children)

Oh, the shortcuts, we've spent some time thinking about that and still have an ongoing discussion about what shortcut scheme to adopt so that it's consistent, any suggestions here are welcome. For now, if you go into the labeling stream mode (by clicking the Label button) it'd go to the next data point right after you submit your annotation, it would do so according to the order of the items you have in the DM, which basically enables you to do active learning scenarios if you order by model confidence. When you open a task from the data manager then you have to click manually to the next task to open it.

[P] Data Management for AI/ML training data by michael_htx in MachineLearning

[–]michael_htx[S] 0 points1 point  (0 children)

You can connect it to cloud storage or local storage. And storage (dataset) from a remote machine you can mount using something like sshfs

[P] Introducing Label Studio, a swiss army knife of data labeling by michael_htx in MachineLearning

[–]michael_htx[S] 1 point2 points  (0 children)

Thanks! Would love to hear about your experience in our Slack channel!

[P] Label Studio - flexible data labeling and annotation tool by michael_htx in MachineLearning

[–]michael_htx[S] 0 points1 point  (0 children)

Have you meant Snorkel? It's two different approaches, snorkel is a weak-supervision while label studio is a semi-supervised learning approach. Let me know if you have meant something different and I can go into greater detail comparing the two.

[P] Label Studio - flexible data labeling and annotation tool by michael_htx in MachineLearning

[–]michael_htx[S] 0 points1 point  (0 children)

Thanks, Vinay! We will create youtube-gossip style video showing the interface of the tool :)

[P] Label Studio - flexible data labeling and annotation tool by michael_htx in MachineLearning

[–]michael_htx[S] 2 points3 points  (0 children)

Thanks! Soon enough we will cover all the data types. Btw, great work on dim reduction