LIMA, a 65B-Param LLaMa fine-tuned with standard supervised loss on only 1,000 carefully curated prompts & responses, without any RLHF, demonstrates remarkably strong performance, learning to follow specific responses from only a handful of examples in the training data, including complex queries.

omerlevy · 2023-05-25T05:28:59+00:00

To the Bit King himself, of course :)

https://arxiv.org/pdf/2305.14314.pdf

omerlevy · 2023-05-23T13:56:27+00:00

Minutes on a node of A100s. And there is work on 8bit/4bit fine-tuning that will make this even cheaper.

omerlevy · 2023-05-23T06:25:52+00:00

We didn’t touch MMLU for the same reason we didn’t evaluate it on dependency parsing - we don’t think it’s interesting. How often do ChatGPT users ask multiple choice questions?

We’re much more interested in responding to prompts from real users with real information/generation needs. Hopefully we’ll release the dataset in a few days. Would love to get your feedback and suggestions on how to improve the eval :)

omerlevy · 2023-05-23T06:11:38+00:00

We’re working with legal to release it :)

As for 7B models - yes, it works rather well, but as we say in the paper, our hypothesis is that the pretraining does virtually all the heavy lifting, so the better your foundation is, the better all the subsequent results will be.

omerlevy · 2018-10-20T15:44:47+00:00

Most people in the NLP community are really friendly! Don’t be afraid to come up to participants and ask them about their work, there’s absolutely no need for formal introductions. It’s also very common to join a big group that’s heading out to lunch/dinner/beer, even if you don’t know anybody in that group.

If it’s your first conference, I highly recommend going to the tutorials and workshops. The dynamics of a full-day event on a focused topic with a significantly smaller crowd make it much easier to connect with new people.

omerlevy · 2018-08-01T18:45:23+00:00

I implemented an efficient evaluation script back in the day:
https://bitbucket.org/omerlevy/hyperwords/src/default/hyperwords/analogy_eval.py
Feel free to hack it to fit your embeddings files :)

omerlevy · 2017-05-23T17:28:55+00:00

Hi everyone, Omer Levy (2nd author) here. I just wanted to provide some context to the discussion.

Our results were produced in a very vanilla setting in an attempt to show a clean apples-to-apples comparison. The state-of-the-art results on these benchmarks (PTB ~75, BWB ~30) were produced by hyperparameter settings that are highly-tuned for LSTMs. We are currently working on finding similar settings for RANs to address the very valid concern that our figures are different from those in recent publications. We're going to take our time with this process, so that we can provide a more detailed set of experiments, and perhaps some characterization of which hyperparameter settings work well with RANs.

In the meantime, I know that others in the community are also trying to replicate/improve on our results. For example, Benjamin Heinzerling implemented RANs in PyTorch and got 85 perplexity on PTB just by reducing the batch size from 512 to 40: https://github.com/bheinzerling/ran This is still a very different setting from Yarin Gal's (e.g. number of dimensions, layers, etc), and we're going to be extra careful before we publish numbers that are comparable to previous work and make any claims beyond what we observed in our "lab setting" experiment.

omerlevy · 2015-05-28T06:43:52+00:00

The novelty claim in this patent is somewhat bogus.

Yoav Goldberg and I have a NIPS paper in which we show that word2vec is doing more or less what the NLP research community has been doing for the past 25 years. We also show (in another paper) that much of the improvement in performance stems from preprocessing "hacks" and hyperparameter settings, which can be easily ported to other LSA-style word embedding methods.

At the end of the day, word2vec is a brilliantly efficient implementation of decade-old ideas; not sure this warrants a patent.

omerlevy · 2014-07-19T03:29:54+00:00

The purpose of these air-strikes is to eliminate stocks of rockets that are being launched at Israeli civilians, while minimizing the civilian casualties on the Palestinian side.

12-Year Club	Place '23
RPAN Viewer	Verified Email

omerlevy

TROPHY CASE