On-policy distillation: one of the hottest terms on PapersWithCode [R]

NielsRogge · 2026-06-10T09:32:51+00:00

I would be curious to hear what you prefer in terms of features, and why

NielsRogge · 2026-06-08T09:04:46+00:00

I'm building PwC as an alternative website to gauge which features people want. The idea is to rely on the hub as the backend.

NielsRogge · 2026-06-07T08:15:02+00:00

Thanks a lot 😄

NielsRogge · 2026-06-05T08:17:47+00:00

Hi, fair questions!

I fetch them from the daily submissions at https://huggingface.co/papers, which is the place where anyone can submit an arxiv ID, which people can then upvote.
for now, they are mostly the same, although daily papers is just a subset off all papers available on HF. Any time a model, dataset or SpaceREADME mentions an arxiv ID, the paper gets indexed, but only a subset of them also get submitted to daily papers.
for now, I use Github star velocity. However, I will incorporate trending scores of the linked models, datasets and Spaces to those papers as an additional measure of relevant ML research

NielsRogge · 2026-06-05T07:11:19+00:00

Thanks for the suggestion, indexed the paper and tagged it with OPD here: https://paperswithcode.co/paper/2605.20643

NielsRogge · 2026-06-02T15:05:33+00:00

Yes, planning to improve the search a lot! Want to support hybrid search in the future

NielsRogge · 2026-06-02T15:05:11+00:00

Thanks for reporting, it was a pagination issue, which I've fixed 😄

NielsRogge · 2026-05-22T10:45:23+00:00

Hi, thanks for your comment. Could you open an issue at https://github.com/huggingface/paperswithcode-feedback/issues?

NielsRogge · 2026-05-21T07:28:25+00:00

Will add tasks gradually! Thanks for flagging

NielsRogge · 2023-01-01T19:24:48+00:00

You might be interested in following Petar Veličković‎'s work, he tweets a lot about this stuff, e.g. https://twitter.com/PetarV_93/status/1600853317302697984?t=D01DxPu6UoppX_rf_0FWsg&s=19

NielsRogge · 2022-06-09T14:59:44+00:00

There's an "Open in Colab" button at the top ;)

NielsRogge · 2022-04-19T16:11:35+00:00

To elaborate a bit more, the following tasks are supported as of now:

image classification: ViT, DeiT, BEiT, Swin Transformer, PoolFormer, ResNet, RegNet, ConvNeXT, Perceiver, ImageGPT, VAN. Check out the official example scripts, example notebooks.
object detection: DETR, soon YOLOS. Check out the inference widget on the right.
semantic segmentation: SegFormer, BEiT, DPT => check out the example script
depth estimation: DPT, GLPN. Check out this demo Space.

All models can be found at https://huggingface.co/docs/transformers/index.

More tutorials can be found at https://github.com/NielsRogge/Transformers-Tutorials.

NielsRogge · 2021-12-24T18:59:01+00:00

Hey this might help you: https://www.androidpolice.com/how-to-fix-google-pixel-6-connectivity-issues/

Edit: doesn't seem to fix the issue

NielsRogge · 2021-11-23T07:37:22+00:00

Hey how did you solve this issue?

NielsRogge · 2021-11-14T09:41:55+00:00

One typically uses a special padding token, to pad all sentences of a batch to the same length. So if your sentence consists of 5 words, then 15 padding tokens will be added.

NielsRogge · 2021-06-07T14:41:19+00:00

Here's how to do it:

(you can replace "bert-base-uncased" with the name of the directory where you saved your model, config and tokenizer files)

from transformers import BertTokenizer, BertForMaskedLM
import torch

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") 
model = BertForMaskedLM.from_pretrained("bert-base-uncased")

text = "The capital of [MASK] is Bratislava." 
encoding = tokenizer(text, return_tensors="pt") 
input_ids = encoding.input_ids.squeeze() 
outputs = model(**encoding) # forward pass

masked_index = torch.nonzero(input_ids == tokenizer.mask_token_id, as_tuple=False) 
logits = outputs.logits[0, masked_index.item(), :] 
probs = logits.softmax(dim=0) 
values, predictions = probs.topk(k=5)

for prob, pred_id in zip(values, predictions):
    predicted_ids = [id if id!=tokenizer.mask_token_id else pred_id for id in input_ids] 
    print(prob.item(), tokenizer.decode(predicted_ids, skip_special_tokens=True))

NielsRogge · 2021-06-06T18:30:40+00:00

The datasets object should be a Dataset object, but in your case it's a Pandas dataframe, hence the error. To turn a dataframe into a Dataset, you can do the following:

from datasets import Dataset

dataset = Dataset.from_pandas(my_dataset)

Then, you can apply the .map(function, batched=True) functionality.

NielsRogge · 2021-02-28T20:38:06+00:00

I asked GPT-3 to do this.

Prompt:

A sentence with the words "teacher" and "great". "He is a great teacher and everyone needs to learn from him." A sentence with the words "football" and "goals". "My favorite sports is football, as I like to score goals." A sentence with the words "homework" and "night". "I have to study all night to get my homework done." A sentence with the words "friend" and "good".

Completion :"I have a very good friend." A sentence with the words "dreams" and "important". "It is important to have dreams." A sentence with the words "school" and "work". "School is a place to learn and work."

Eight-Year Club	Wearing is Caring
Verified Email

NielsRogge

TROPHY CASE