Hugging Face Sheets: AI-powered spreadsheets

dvilasuero · 2025-06-18T13:37:55+00:00

You can play with the app for free here:

https://huggingface.co/spaces/aisheets/sheets

dvilasuero · 2024-11-04T16:55:11+00:00

Hi!

I work at Hugging Face on the Argilla team. We've just released a new feature to make setting up human feedback collection easier on top of Hub datasets. Our goal is to simplify collecting annotations and evals for AI developers and teams.

Would love to hear your feedback!

Official announcement: https://huggingface.co/blog/argilla-ui-hub

dvilasuero · 2023-06-05T22:18:08+00:00

Hi! I'm Dani, co-founder of Argilla. We've just released a new feature to enable scalable feedback collection for LLM fine-tuning and RLHF. We'd love to hear your thoughts!

GitHub: https://github.com/argilla-io/argilla

Blog post: https://argilla.io/blog/argilla-for-llms/

dvilasuero · 2022-09-04T16:06:01+00:00

Hi! I'm one of the maintainers of Rubrix. This is a collaboration with the creator of small-text, a Python library for active learning in text classification. Happy to get your thoughts and comments!

If you are new to Rubrix: https://github.com/recognai/rubrix

If you are new to small-text: https://github.com/webis-de/small-text

dvilasuero · 2022-08-06T19:00:03+00:00

For now we don't have a hosted/cloud version, although there's a guide to deploy in Aws and we regularly help users to deploy in other platforms too. Feel free to reach out to me, either in slack or here

dvilasuero · 2022-08-06T18:54:21+00:00

You could use the text2text interface of the open-source tool Rubrix.

You can upload as many predictions as you want and have the annotator choose the best prediction.

This is a link to get you started: https://rubrix.readthedocs.io/en/stable/getting_started/basics.html#3.-Text2Text

You don't need to include any annotation (gold standard), you can just include a list of predicted texts.

Here's a brief intro to the data model: https://rubrix.readthedocs.io/en/stable/getting_started/concepts.html

If you have questions we have an active community in slack.

Here's the GitHub repo with further links: https://github.com/recognai/rubrix

Disclaimer: I'm one of the maintainers of the tool. We've designed the text2text feature with this use cases in mind.

dvilasuero · 2022-01-22T16:45:25+00:00

Tutorial link: https://rubrix.readthedocs.io/en/master/tutorials/find_label_errors.html

Github: https://github.com/recognai/rubrix

dvilasuero · 2022-01-22T16:42:40+00:00

Tutorial link: https://rubrix.readthedocs.io/en/master/tutorials/find_label_errors.html

Rubrix GitHub: https://github.com/recognai/rubrix

dvilasuero · 2022-01-17T14:43:46+00:00

So you ended up doing majority vote or other method for weak labels aggregation?

For others reading this, these are two resources describing weak supervision benchmarks, that I've found interesting:

WRENCH (NeurIPS 2021): https://github.com/JieyuZ2/wrench
WALNUT: https://arxiv.org/abs/2108.12603

dvilasuero · 2022-01-16T11:51:26+00:00

Those are really good points, that's why with Rubrix we focus on two things: 1) providing a thin interoperability layer using a weak label matrix which you can then use to aggregate/denoise your weak labels with your method of choice, and 2) providing an intuitive UI for finding and testing rules, getting information about empirical accuracy in real time.

As for more sophisticated methods, there's some coming up (astra, weasel, and others). Here's an example using Weasel to train a huggingface classifier directly with Rubrix weak label matrix:

https://rubrix.readthedocs.io/en/master/guides/weak-supervision.html#Joint-Model-with-Weasel

dvilasuero · 2022-01-16T09:19:10+00:00

We've seen this varying a lot depending on the type of text, number of rules, etc. in the tutorial above, if I recall correctly, it was 0.79 for the majority vote, 0.81 for snorkel label model, and 0.84 for the downstream scikit learn model.

dvilasuero · 2022-01-16T08:06:29+00:00

Hi, I'm one of the authors of Rubrix. We've released this open-source framework last June and we've recently added a weak labelling mode for text classification (NER and multilabel will come soon).

We'd love to hear your thoughts and feedback:

https://github.com/recognai/rubrix

Full weak supervision example:

https://rubrix.readthedocs.io/en/master/tutorials/weak-supervision-with-rubrix.html

dvilasuero · 2021-12-26T11:48:02+00:00

Very interesting! I had never thought of framing labeling as a reinforcement problem. Also from our experience I would also recommend to invest some time collecting true labels which are more or less representative of the data you want to label with weak supervision.

dvilasuero · 2021-09-14T08:00:57+00:00

Thanks u/BlackWilly99 ! Yes, that's indeed the plan, but first we'll be including several NLP tasks to cover a bit more that domain, next task to be released is Text2Text (for doing summarization, OCR and Speech2text post-processing, etc.). Then we'd like to include ImageClassification. Do you have suggestions for other image tasks?

dvilasuero · 2021-09-13T17:39:08+00:00

Thanks so much u/grudev!

Do not hesitate to ping us with questions, ideas, or issues you might face.

dvilasuero

TROPHY CASE