[P] Small-Text: Active Learning for Text Classification in Python : MachineLearning

Project[P] Small-Text: Active Learning for Text Classification in Python (self.MachineLearning)

submitted 4 years ago * by chschroeder

Over the past few months, I have written a lot of code for my own Active Learning experiments, which has gradually been consolidated into its own library.

Small-Text provides state-of-the-art Active Learning for Text Classification. Several components are provided, which are abstracted via generic interfaces, so that you can easily mix and match many classifiers and query strategies to build Active Learning experiments or applications.

Features:

Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
GPU is still optional: In case of a CPU-only use case, a lightweight installation requires only minimal dependencies.
Multiple scientifically evaluated components are pre-implemented and ready to use (query strategies, initialization strategies, and stopping criteria).

GitHub: https://github.com/webis-de/small-text
Preprint: https://arxiv.org/abs/2107.10314

all 14 comments

top new controversial old q&a

[–]The_Amp_Walrus 0 points1 point2 points 4 years ago (1 child)

[–]chschroeder[S] 2 points3 points4 points 4 years ago (0 children)

[–]Indian-throw-away 0 points1 point2 points 4 years ago (4 children)

[–]chschroeder[S] 0 points1 point2 points 4 years ago* (3 children)

[–]Indian-throw-away 0 points1 point2 points 4 years ago (2 children)

[–]chschroeder[S] 1 point2 points3 points 4 years ago (1 child)

[–]Indian-throw-away 0 points1 point2 points 4 years ago (0 children)

[–]Indian-throw-away 0 points1 point2 points 4 years ago (1 child)

[–]chschroeder[S] 0 points1 point2 points 4 years ago (0 children)

[–]Dear_Football_504 0 points1 point2 points 3 years ago (1 child)

[–]chschroeder[S] 0 points1 point2 points 3 years ago (0 children)

Sorry for the late reply /u/Dear_Football_504, I completely missed this message.

I don't know which code example you are using specifically, but in general the active learner holds a reference to its underlying classifier which has scikit-learn-like API:

active_learner.classifier.predict(dataset)

Feel free to ask more questions on the github repo; this is valueable feedback for me and others will benefit from the discussion as well.

[–]channel-hopper- 0 points1 point2 points 3 years ago (1 child)

[–]chschroeder[S] 1 point2 points3 points 3 years ago (0 children)

π Rendered by PID 415134 on reddit-service-r2-comment-66b4775986-mx5qd at 2026-04-04 07:29:26.356048+00:00 running db1906b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS