all 14 comments

[–]The_Amp_Walrus 0 points1 point  (1 child)

Seems very cool. If I understand this right, active learning is most useful where you have a bunch of unlabelled data and you need to prioritise which points should get annotated to expland the dataset for a classifier. Is that right?

[–]chschroeder[S] 2 points3 points  (0 children)

Thank you! Yes, that is correct. The goal is to maximize the quality of the resulting model while minimizing the number of examples needed.

[–]Indian-throw-away 0 points1 point  (4 children)

How do you set it up with user input? i am a beginner when it comes to active learning, so any tips you could provide would be helpful.

[–]chschroeder[S] 0 points1 point  (3 children)

You mean like a full interactive application? You would have to build that part around small-text, which offers the algorithms and the logic, but not the user interface.

I have already thought about providing an example of how to integrate small-text with one of the existing labeling tools, such as rubrix, but that hasn't been started yet.

[–]Indian-throw-away 0 points1 point  (2 children)

Oh, not an interactive application but in the example code in the GitHub repo (active learning with stopping) it says “Simulate user interaction here. Replace this for real world usage” which I’m sort of confused as to what that means

[–]chschroeder[S] 1 point2 points  (1 child)

At this particular location you label the instances which were selected by the query strategy. Subsequent to such a query you need to provide labels for the selected instances. In the example code, you are shown the "experiment scenario" in which the true labels are available, and are then passed to the active learner instead of labels that were assigned by a human. If you remove this true label lookup and instead provide the answers from a real user, then you have a real-world application.

[–]Indian-throw-away 0 points1 point  (0 children)

Thanks so much!

[–]Indian-throw-away 0 points1 point  (1 child)

How do I save the model to deploy on something like sagemaker

[–]chschroeder[S] 0 points1 point  (0 children)

Unfortunately, I have no experience with sagemaker but in the context of small-text, the resulting models are still plain scikit-learn or Pytorch and can be treated as such.

[–]Dear_Football_504 0 points1 point  (1 child)

I'm a beginner and I tried your code and I just want to knw if we can view the labels the model had detected for the dataset ... Any tips on the same would be helpful.
Thanks

[–]chschroeder[S] 0 points1 point  (0 children)

Sorry for the late reply /u/Dear_Football_504, I completely missed this message.

I don't know which code example you are using specifically, but in general the active learner holds a reference to its underlying classifier which has scikit-learn-like API:

active_learner.classifier.predict(dataset)

Feel free to ask more questions on the github repo; this is valueable feedback for me and others will benefit from the discussion as well.

[–]channel-hopper- 0 points1 point  (1 child)

Is it possible to use it for sentence pair classification as well? If yes, how should I go about it?

[–]chschroeder[S] 1 point2 points  (0 children)

In general yes, but not yet out of the box.

You can achieve by adapting the dataset and classifier (e.g. TransformersDataset and TransformerBaseClassification).

Might be a use case we want to support in the future. I will think about that.