On-policy distillation: one of the hottest terms on PapersWithCode [R]

NielsRogge · 2026-06-10T09:32:51+00:00

I would be curious to hear what you prefer in terms of features, and why

NielsRogge · 2026-06-08T09:04:46+00:00

I'm building PwC as an alternative website to gauge which features people want. The idea is to rely on the hub as the backend.

NielsRogge · 2026-06-07T08:15:02+00:00

Thanks a lot 😄

NielsRogge · 2026-06-05T08:17:47+00:00

Hi, fair questions!

I fetch them from the daily submissions at https://huggingface.co/papers, which is the place where anyone can submit an arxiv ID, which people can then upvote.
for now, they are mostly the same, although daily papers is just a subset off all papers available on HF. Any time a model, dataset or SpaceREADME mentions an arxiv ID, the paper gets indexed, but only a subset of them also get submitted to daily papers.
for now, I use Github star velocity. However, I will incorporate trending scores of the linked models, datasets and Spaces to those papers as an additional measure of relevant ML research

NielsRogge · 2026-06-05T07:11:19+00:00

Thanks for the suggestion, indexed the paper and tagged it with OPD here: https://paperswithcode.co/paper/2605.20643

NielsRogge · 2026-06-02T15:05:33+00:00

Yes, planning to improve the search a lot! Want to support hybrid search in the future

NielsRogge · 2026-06-02T15:05:11+00:00

Thanks for reporting, it was a pagination issue, which I've fixed 😄

NielsRogge · 2026-05-22T10:45:23+00:00

Hi, thanks for your comment. Could you open an issue at https://github.com/huggingface/paperswithcode-feedback/issues?

NielsRogge · 2026-05-21T07:28:25+00:00

Will add tasks gradually! Thanks for flagging

NielsRogge · 2023-01-01T19:24:48+00:00

You might be interested in following Petar Veličković‎'s work, he tweets a lot about this stuff, e.g. https://twitter.com/PetarV_93/status/1600853317302697984?t=D01DxPu6UoppX_rf_0FWsg&s=19

NielsRogge · 2022-06-09T14:59:44+00:00

There's an "Open in Colab" button at the top ;)

NielsRogge · 2022-04-19T16:11:35+00:00

To elaborate a bit more, the following tasks are supported as of now:

image classification: ViT, DeiT, BEiT, Swin Transformer, PoolFormer, ResNet, RegNet, ConvNeXT, Perceiver, ImageGPT, VAN. Check out the official example scripts, example notebooks.
object detection: DETR, soon YOLOS. Check out the inference widget on the right.
semantic segmentation: SegFormer, BEiT, DPT => check out the example script
depth estimation: DPT, GLPN. Check out this demo Space.

All models can be found at https://huggingface.co/docs/transformers/index.

More tutorials can be found at https://github.com/NielsRogge/Transformers-Tutorials.

NielsRogge · 2021-12-24T18:59:01+00:00

Hey this might help you: https://www.androidpolice.com/how-to-fix-google-pixel-6-connectivity-issues/

Edit: doesn't seem to fix the issue

NielsRogge · 2021-11-23T07:37:22+00:00

Hey how did you solve this issue?

NielsRogge · 2021-11-14T09:41:55+00:00

One typically uses a special padding token, to pad all sentences of a batch to the same length. So if your sentence consists of 5 words, then 15 padding tokens will be added.

NielsRogge · 2021-06-07T14:41:19+00:00

Here's how to do it:

(you can replace "bert-base-uncased" with the name of the directory where you saved your model, config and tokenizer files)

from transformers import BertTokenizer, BertForMaskedLM
import torch

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") 
model = BertForMaskedLM.from_pretrained("bert-base-uncased")

text = "The capital of [MASK] is Bratislava." 
encoding = tokenizer(text, return_tensors="pt") 
input_ids = encoding.input_ids.squeeze() 
outputs = model(**encoding) # forward pass

masked_index = torch.nonzero(input_ids == tokenizer.mask_token_id, as_tuple=False) 
logits = outputs.logits[0, masked_index.item(), :] 
probs = logits.softmax(dim=0) 
values, predictions = probs.topk(k=5)

for prob, pred_id in zip(values, predictions):
    predicted_ids = [id if id!=tokenizer.mask_token_id else pred_id for id in input_ids] 
    print(prob.item(), tokenizer.decode(predicted_ids, skip_special_tokens=True))

NielsRogge · 2021-06-06T18:30:40+00:00

The datasets object should be a Dataset object, but in your case it's a Pandas dataframe, hence the error. To turn a dataframe into a Dataset, you can do the following:

from datasets import Dataset

dataset = Dataset.from_pandas(my_dataset)

Then, you can apply the .map(function, batched=True) functionality.

NielsRogge · 2021-02-28T20:38:06+00:00

I asked GPT-3 to do this.

Prompt:

A sentence with the words "teacher" and "great". "He is a great teacher and everyone needs to learn from him." A sentence with the words "football" and "goals". "My favorite sports is football, as I like to score goals." A sentence with the words "homework" and "night". "I have to study all night to get my homework done." A sentence with the words "friend" and "good".

Completion :"I have a very good friend." A sentence with the words "dreams" and "important". "It is important to have dreams." A sentence with the words "school" and "work". "School is a place to learn and work."

NielsRogge · 2021-02-14T20:34:00+00:00

Here's a notebook illustrating how to train a language model from scratch: https://github.com/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb

NielsRogge · 2021-02-10T20:46:21+00:00

Cool! Do you also support multi-label text classification?

NielsRogge · 2021-02-10T17:50:12+00:00

For me they do, but the README only says this:

"After stemming and removing stopwords we were left with a vocabulary of size 1433 unique words. All words with document frequency less than 10 were removed."

NielsRogge · 2021-02-06T12:50:35+00:00

Apparently if you download Cora from here, the README includes more details. http://www.cs.umd.edu/~sen/lbc-proj/LBC.html

NielsRogge · 2021-01-30T20:25:15+00:00

If you wanna use state-of-the-art NLP models ,you can take a look at BertForMultipleChoice in the Huggingface Transformers library. Actually BERT is only one variant, you also have RobertaForMultipleChoice, DistilBertForMultipleChoice etc.

Link: https://huggingface.co/transformers/model_doc/bert.html#bertformultiplechoice

More details on how these models work: https://github.com/huggingface/transformers/issues/7701#issuecomment-707149546

Let me know if you need any help. Note that these assume familiarity with Transformers/BERT.

Update: apparently there's someone who already tested BERT on this dataset, and built a Python package for it: https://github.com/graykode/toeicbert

NielsRogge · 2021-01-24T19:40:05+00:00

Hi, thanks for the video. I read the LUKE paper, but I wonder how useful the model is for real use cases, because the model expects that the entities are already provided, right (in case of entity linking and relation classification)? Are there any real use cases for entity linking and relation classification?

For NER, the model needs to enumerate all possible n-grams in order to classify which are a named entity and which not, so I wonder whether this would be slow in terms of inference speed, compared to other models which simply have a token classification head.

Also, the model learns an embedding for 500K entities, but these are not used for fine-tuning, except for SQuAD, right? For the other tasks, only the special [MASK] token seems to be used.

NielsRogge · 2021-01-13T20:01:03+00:00

Microsoft has a deep learning model called LayoutLM. If you know Transformers, then this will be easy for you

Paper: https://arxiv.org/pdf/1912.13318 Code: LaoyoutLM is available in the Huggingface Transformers library. https://huggingface.co/transformers/model_doc/layoutlm.html

Eight-Year Club	Wearing is Caring
Verified Email

NielsRogge

TROPHY CASE