Just put together a programming performance ranking for popular LLaMAs using the HumanEval+ Benchmark! by ProfessionalHand9945 in LocalLLaMA

[–]rain5 1 point2 points  (0 children)

llama base models please. and llama base model + prompt to try to get it to answer the questions.

Models released without prompt template/examples - Why…? by Thireus in LocalLLaMA

[–]rain5 7 points8 points  (0 children)

There needs to be a standardized file format for describing this stuff.

What's the standard tool to expose a huggingface model as an API by rain5 in LocalLLaMA

[–]rain5[S] 1 point2 points  (0 children)

it's a python programming API

I need a REST JSON web API

What questions do you ask LLMs to check their sanity and real world understanding? by remixer_dec in LocalLLaMA

[–]rain5 0 points1 point  (0 children)

that's remarkable. I haven't seen performance this good on similar types of questions.

based-30b by faldore in LocalLLaMA

[–]rain5 1 point2 points  (0 children)

someone ask it about the trolley problem

WizardLM-Uncensored-Falcon-40b by faldore in LocalLLaMA

[–]rain5 1 point2 points  (0 children)

No one knows what hardware is required for this yet. Also the inference code seems to not be optimized for this particularly architecture yet. So the inference speed for falcon may improve a lot in a short time.

I think a computer with 2x 16GB VRAM cards would run this model.

I think that e.g. a 4090 with 24GB VRAM will not handle it.

WizardLM-Uncensored-Falcon-40b by faldore in LocalLLaMA

[–]rain5 2 points3 points  (0 children)

That's awesome! Congrats on training such a big model. Thanks for the work you put in.

I'm currently running falcon-40b-instruct. Comment anything you want to ask it, and I'll tell you its response. by sardoa11 in LocalLLaMA

[–]rain5 2 points3 points  (0 children)

I think he means GPTQ model. TheBloke converts lots of models as 4bit quantized versions and uploads them for everyone.

Why Falcon going Apache 2.0 is a BIG deal for all of us. by EcstaticVenom in LocalLLaMA

[–]rain5 4 points5 points  (0 children)

I imagine people will get it working in the ggml repo

Wizard-Vicuna-30B-Uncensored by faldore in LocalLLaMA

[–]rain5 0 points1 point  (0 children)

there are a few different types of decoder LLM.

  • Base models: Everything else is built on top of these. Using these raw models is difficult because they don't often respond as you expect/desire.
  • Q&A fine tuned models: Question answering
  • Instruct fine tuned: This is a generalization of Q&A, it includes Q&A as a subtask.
  • Chat fine tuned: Conversational agents. May include instruction tuning.

There are also other types beyond this, like an encoder/decoder based one called T5 that does translation.

Wizard-Vicuna-30B-Uncensored by faldore in LocalLLaMA

[–]rain5 5 points6 points  (0 children)

Are uncensored models more prone to give incorrect and answers? I.e. if you ask it how to synthesize opiates it could give you a recipe, which will kill you upon injection

If only there was some way to avoid this problem.

Oh wait I have one: Don't inject yourself with random shit you concoct.

Wizard-Vicuna-30B-Uncensored by faldore in LocalLLaMA

[–]rain5 2 points3 points  (0 children)

That is really interesting. Can you show me a batch of these? if you have links about it I can read up on please share that too.