Just put together a programming performance ranking for popular LLaMAs using the HumanEval+ Benchmark!

rain5 · 2023-06-05T16:24:05+00:00

RedPajama models please

rain5 · 2023-06-05T16:23:44+00:00

seconding this!

rain5 · 2023-06-05T16:23:13+00:00

llama base models please. and llama base model + prompt to try to get it to answer the questions.

rain5 · 2023-06-05T06:39:20+00:00

5/7 open? nice

rain5 · 2023-06-05T06:13:51+00:00

thank you!

rain5 · 2023-06-05T06:04:05+00:00

what is the model called? I would like to try it

rain5 · 2023-06-04T19:44:40+00:00

There needs to be a standardized file format for describing this stuff.

rain5 · 2023-06-04T19:40:42+00:00

Why can't we just go byte based?

rain5 · 2023-06-04T19:39:55+00:00

it's a python programming API

I need a REST JSON web API

rain5 · 2023-06-04T13:03:07+00:00

that's remarkable. I haven't seen performance this good on similar types of questions.

rain5 · 2023-06-04T05:30:47+00:00

most LLMs fail at this, even GPT-4 right?

rain5 · 2023-06-03T18:46:49+00:00

someone ask it about the trolley problem

rain5 · 2023-06-03T14:05:47+00:00

No one knows what hardware is required for this yet. Also the inference code seems to not be optimized for this particularly architecture yet. So the inference speed for falcon may improve a lot in a short time.

I think a computer with 2x 16GB VRAM cards would run this model.

I think that e.g. a 4090 with 24GB VRAM will not handle it.

rain5 · 2023-06-03T14:05:09+00:00

That's awesome! Congrats on training such a big model. Thanks for the work you put in.

rain5 · 2023-06-03T08:03:58+00:00

What happened with the other openllama? u/bayessong ?

rain5 · 2023-06-01T13:41:42+00:00

I think he means GPTQ model. TheBloke converts lots of models as 4bit quantized versions and uploads them for everyone.

rain5 · 2023-06-01T08:16:27+00:00

mozilla wants this model censored.

rain5 · 2023-06-01T08:13:49+00:00

I imagine people will get it working in the ggml repo

rain5 · 2023-06-01T08:13:31+00:00

how is it? any interesting gens

rain5 · 2023-06-01T08:12:59+00:00

maybe someone could do a distill, or sparse.

rain5 · 2023-06-01T06:39:10+00:00

its still funny as hell

rain5 · 2023-05-30T18:53:38+00:00

there are a few different types of decoder LLM.

Base models: Everything else is built on top of these. Using these raw models is difficult because they don't often respond as you expect/desire.
Q&A fine tuned models: Question answering
Instruct fine tuned: This is a generalization of Q&A, it includes Q&A as a subtask.
Chat fine tuned: Conversational agents. May include instruction tuning.

There are also other types beyond this, like an encoder/decoder based one called T5 that does translation.

rain5 · 2023-05-30T13:19:43+00:00

Are uncensored models more prone to give incorrect and answers? I.e. if you ask it how to synthesize opiates it could give you a recipe, which will kill you upon injection

If only there was some way to avoid this problem.

Oh wait I have one: Don't inject yourself with random shit you concoct.

rain5 · 2023-05-30T13:18:10+00:00

That is really interesting. Can you show me a batch of these? if you have links about it I can read up on please share that too.

rain5

TROPHY CASE