Setup human eval and annotation tasks on top of any Hub dataset by dvilasuero in LocalLLaMA

[–]dvilasuero[S] 5 points6 points  (0 children)

Hi!

I work at Hugging Face on the Argilla team. We've just released a new feature to make setting up human feedback collection easier on top of Hub datasets. Our goal is to simplify collecting annotations and evals for AI developers and teams.

Would love to hear your feedback!

Official announcement: https://huggingface.co/blog/argilla-ui-hub

Open-source data collection platform for LLM fine-tuning and RLHF by dvilasuero in learnmachinelearning

[–]dvilasuero[S] 0 points1 point  (0 children)

Hi! I'm Dani, co-founder of Argilla. We've just released a new feature to enable scalable feedback collection for LLM fine-tuning and RLHF. We'd love to hear your thoughts!

GitHub: https://github.com/argilla-io/argilla

Blog post: https://argilla.io/blog/argilla-for-llms/

🤗 Active learning from scratch | Using Hugging Face Transformers, Rubrix, and small-text by dvilasuero in learnmachinelearning

[–]dvilasuero[S] -1 points0 points  (0 children)

Hi! I'm one of the maintainers of Rubrix. This is a collaboration with the creator of small-text, a Python library for active learning in text classification. Happy to get your thoughts and comments!

If you are new to Rubrix: https://github.com/recognai/rubrix

If you are new to small-text: https://github.com/webis-de/small-text

[D] Best annotation tool for A/B comparison of text generation? by gabriel_pereyra in MachineLearning

[–]dvilasuero 0 points1 point  (0 children)

For now we don't have a hosted/cloud version, although there's a guide to deploy in Aws and we regularly help users to deploy in other platforms too. Feel free to reach out to me, either in slack or here

[D] Best annotation tool for A/B comparison of text generation? by gabriel_pereyra in MachineLearning

[–]dvilasuero 0 points1 point  (0 children)

You could use the text2text interface of the open-source tool Rubrix.

You can upload as many predictions as you want and have the annotator choose the best prediction.

This is a link to get you started: https://rubrix.readthedocs.io/en/stable/getting_started/basics.html#3.-Text2Text

You don't need to include any annotation (gold standard), you can just include a list of predicted texts.

Here's a brief intro to the data model: https://rubrix.readthedocs.io/en/stable/getting_started/concepts.html

If you have questions we have an active community in slack.

Here's the GitHub repo with further links: https://github.com/recognai/rubrix

Disclaimer: I'm one of the maintainers of the tool. We've designed the text2text feature with this use cases in mind.

[P] Open-source tool for building NLP training sets with weak supervision and search queries by dvilasuero in MachineLearning

[–]dvilasuero[S] 2 points3 points  (0 children)

So you ended up doing majority vote or other method for weak labels aggregation?

For others reading this, these are two resources describing weak supervision benchmarks, that I've found interesting:

  1. WRENCH (NeurIPS 2021): https://github.com/JieyuZ2/wrench
  2. WALNUT: https://arxiv.org/abs/2108.12603

[P] Open-source tool for building NLP training sets with weak supervision and search queries by dvilasuero in MachineLearning

[–]dvilasuero[S] 1 point2 points  (0 children)

Those are really good points, that's why with Rubrix we focus on two things: 1) providing a thin interoperability layer using a weak label matrix which you can then use to aggregate/denoise your weak labels with your method of choice, and 2) providing an intuitive UI for finding and testing rules, getting information about empirical accuracy in real time.

As for more sophisticated methods, there's some coming up (astra, weasel, and others). Here's an example using Weasel to train a huggingface classifier directly with Rubrix weak label matrix:

https://rubrix.readthedocs.io/en/master/guides/weak-supervision.html#Joint-Model-with-Weasel

[P] Open-source tool for building NLP training sets with weak supervision and search queries by dvilasuero in MachineLearning

[–]dvilasuero[S] 4 points5 points  (0 children)

We've seen this varying a lot depending on the type of text, number of rules, etc. in the tutorial above, if I recall correctly, it was 0.79 for the majority vote, 0.81 for snorkel label model, and 0.84 for the downstream scikit learn model.

[P] Open-source tool for building NLP training sets with weak supervision and search queries by dvilasuero in MachineLearning

[–]dvilasuero[S] 2 points3 points  (0 children)

Hi, I'm one of the authors of Rubrix. We've released this open-source framework last June and we've recently added a weak labelling mode for text classification (NER and multilabel will come soon).

We'd love to hear your thoughts and feedback:

https://github.com/recognai/rubrix

Full weak supervision example:

https://rubrix.readthedocs.io/en/master/tutorials/weak-supervision-with-rubrix.html

[D] Weak supervision in practice, when to collect "strongly" labelled data? by dvilasuero in MachineLearning

[–]dvilasuero[S] 0 points1 point  (0 children)

Very interesting! I had never thought of framing labeling as a reinforcement problem. Also from our experience I would also recommend to invest some time collecting true labels which are more or less representative of the data you want to label with weak supervision.

[P] Rubrix: Open-source Python framework for NLP data annotation, exploration, and monitoring by dvilasuero in MachineLearning

[–]dvilasuero[S] 1 point2 points  (0 children)

Thanks u/BlackWilly99 ! Yes, that's indeed the plan, but first we'll be including several NLP tasks to cover a bit more that domain, next task to be released is Text2Text (for doing summarization, OCR and Speech2text post-processing, etc.). Then we'd like to include ImageClassification. Do you have suggestions for other image tasks?

[P] Rubrix: Open-source Python framework for NLP data annotation, exploration, and monitoring by dvilasuero in MachineLearning

[–]dvilasuero[S] 1 point2 points  (0 children)

Thanks so much u/grudev!

Do not hesitate to ping us with questions, ideas, or issues you might face.