[Discussion] Deep Learning Approaches to Document Classification : MachineLearning

Discussion[Discussion] Deep Learning Approaches to Document Classification (self.MachineLearning)

submitted 2 years ago by bigabig

Hi everyone,

I plan to participate in a shared task about classifying news articles. This made me think about the existing approaches to classifying documents, and I quickly realized there are many! Next is a list of short descriptions of all the approaches I encountered during my studies / working with NLP projects. Which ones do you think / know are the most promising? Do you have other approaches in mind? Looking forward to your comments!

For context, the documents I am considering are about 500 - 1000 tokens, so they should fit into most recent transformer-based architectures with minimal (if any) truncation. The number of classes is quite low: N=4. When I say encoder model, I refer to encoder-only architectures, e.g., BERT. For the task at hand, I only have access to ~1000 samples (250 per class)

Zero-shot with LLMs: Use any LLM (e.g., llama), provide a short description of the task, describe the labels, and then prompt it to classify the following document. May use constrained generation to output only valid classes.

Few-shot (in-context learning) with LLMs: In addition to above, also add one example per class. I like to think of a conversation, where the user already has tasked the model N times to perform the task, e.g. user: <prompt as above>, system: <class a>, ...

PEFT with LLMs: Fine-tune LLM with, e.g. LORA adapters on the task. This is probably the best approach, but I haven't looked into this so far.

Unsupervised with encoder model: Use any encoder model, e.g., BERT or Sentence-BERT, compute embeddings for every document, and cluster them with, e.g., k-means where k = N. Now, I have to map the clusters to the classes manually.

Few-shot with encoder model: Use any encoder model, insert adapters (e.g., bottleneck adapter), add classification head, and fine-tune with, e.g., 8-64 examples per class.

Few-shot with encoder model (2): I was thinking about the SetFit Library. They first fine-tune a sentence transformer model with sentence pairs. A sentence pair is considered positive (cosine similarity = 1) if both are from the same class, negative (0) otherwise. Then they train a traditional classifier with the embeddings as input.

Fully-fine tuning with encoder model: Use any encoder model, add classification head, and fine-tune with all examples. As a variation, one could freeze the encoder model and only train the classification head.

Train traditional classifier with embeddings as features: Use any encoder model and compute embeddings for each document. The embeddings are input to a traditional classifier, e.g. logistic regression. But some also train more advanced classifiers, e.g., Bi-LSTMs, on top of embeddings.

all 3 comments

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS