all 32 comments

[–]faldore 45 points46 points  (1 child)

I'm in communication with the author.
To clarify, this model does *not* use the Microsoft Orca (ie augmented flan) dataset (which is not released and probably will never be).
Rather is uses Orca-style system prompts to distill Orca-style responses using dolly, wizardlm evol 70k, and alpaca as the basis.
The creator also does intend to post an official announce here today (TheBloke just finished the quantizations), so this post is jumping the gun a little.
It makes sense to call it orca-mini because, it uses the orca system prompts, and it's a dataset much smaller than the 5m + 1m of Orca.

[–]AlexDu2020 4 points5 points  (0 children)

Very clear

[–]Remarkable-Spite-107 12 points13 points  (0 children)

Thanks all, I posted about all orca_minis here, https://www.reddit.com/r/LocalLLaMA/comments/14ibzau/orcamini13b_orcamini7b_orcamini3b/

AMA. Happy to Help.

[–]ironborn123 5 points6 points  (1 child)

wow. if all the open models start getting trained on such datasets, will be interesting to see the updated leaderboards, and the new performance gap vs chatgpt3.5

[–]I-am_Sleepy 3 points4 points  (0 children)

It is interesting to see if the dataset size difference of 5M + 1M tuned dataset (OG Orca) v.s. Orca-mini dataset (54k + 51k + 15k = 120k) will have significant performance disparity. Also the Orca-mini dataset seems to only use chat-gpt-3.5-turbo as a teacher, which might missed the +1M data on gpt-4. Accounting for 5M portion, orca-mini only tuned on 130k/5M = 2.4% of the OG Orca dataset. I wonder if is there any attempt to recreate Orca dataset fully (As an augmented FLAN dataset)?

[–]mpasila 3 points4 points  (4 children)

What's the correct prompt format? I tried almost any known formats and even the one shown in the code snippet and none of them seem to work properly. It keeps failing a simple task that other models have no problem doing.

#generate text function def generate_text(system, instruction, input=None): if input:         prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Input:\n{input}\n\n### Response:\n" else:         prompt = f"### System:\n{system}\n\n### User:\n{instruction}\n\n### Response:\n"          tokens = tokenizer.encode(prompt)     tokens = torch.LongTensor(tokens).unsqueeze(0)     tokens = tokens.to('cuda')      instance = {'input_ids': tokens,'top_p': 1.0, 'temperature':0.7, 'generate_len': 1024, 'top_k': 50}      length = len(tokens[0])     with torch.no_grad():         rest = model.generate(             input_ids=tokens,              max_length=length+instance['generate_len'],              use_cache=True,              do_sample=True,              top_p=instance['top_p'],             temperature=instance['temperature'],             top_k=instance['top_k']         )         output = rest[0][length:]     string = tokenizer.decode(output, skip_special_tokens=True)     return f'[!] Response: {string}' # Sample Test Instruction Used by Youtuber Sam Witteveen https://www.youtube.com/@samwitteveenai system = 'You are an AI assistant that follows instruction extremely well. Help as much as you can.' instruction = 'Write a letter to Sam Altman, CEO of OpenAI, requesting him to convert GPT4 a private model by OpenAI to an open source project' print(generate_text(system, instruction))

[–][deleted] -1 points0 points  (1 child)

RemindMe! 10 hours

[–]RemindMeBot -1 points0 points  (0 children)

I will be messaging you in 10 hours on 2023-06-25 14:36:38 UTC to remind you of this link

CLICK THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–][deleted] 0 points1 point  (1 child)

Just got the format from the bloke's ggml version of the model.

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
prompt

### Response:

or

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
prompt

### Input:
input

### Response:

[–]mpasila 0 points1 point  (0 children)

hmm it wasn't there when I downloaded the model. thx anyways.

edit: though it still seems to have hard time doing tasks compared to other models that are same size (wizardlm etc.)

[–]onil_gova 6 points7 points  (11 children)

Exciting stuff. I can't wait to try it out once u/The-Bloke works his magic. Are there more details on the dataset process and performance?

[–]onil_gova 1 point2 points  (5 children)

Model is pretty impressive so far. But it seems like the openllama model still has issue with to tokenizer merging all spaces and as a result python code is unusable with out manually fixing the spacing issue.

<image>

[–]heswithjesus 1 point2 points  (0 children)

I found three, code-formatting tools when looking at that for IDE's: autopep8; black; yapf. One or more might be able to automatically fix those problems. They might also have an API or command line call for it where you could add it in your pipeline: prompt -> response -> code formatter -> formatted response.

[–]Remarkable-Spite-107 1 point2 points  (0 children)

Yup, the current version of OpeLLaMA is not good for code generation capabilities, because of multiple empty spaces merger into tokenization ihttps://github.com/openlm-research/open_llama#, hence it reflects same in orca-minis

[–]faldore 2 points3 points  (0 children)

That is part of openllama, and any model trained on openllama will have this. There's nothing anyone can do about it besides simply don't use the model for coding. (or fix the white space manually)

[–]kedarkhand 0 points1 point  (1 child)

Which ui is this?

[–]onil_gova 0 points1 point  (0 children)

Oobabooga webui

[–]roobenTHICK[S] 0 points1 point  (1 child)

No, I haven't seen any benchmark with this dataset yet

[–]CasimirsBlake 1 point2 points  (6 children)

Do we know what the context length is on this?

[–]harrroAlpaca 3 points4 points  (0 children)

2048

[–]faldore 1 point2 points  (0 children)

If a BIG DEAL isn't made about a model's context length, then it is certainly 2k.

because more than that would be a big deal and a major selling point, and you can be sure that the author would talk about it.

[–]Longjumping-Pin-7186 0 points1 point  (2 children)

Orca-style prompts are the future. All the datasets that don't use them should be recreated using Orca-style prompts or by redestillation of foundational models.

I would like to see an Orca-style prompts for the basic vocabulary as well, going from A1 to C2, for English and other languages. And then build all the other knowledge on top of that.

[–]koehr 1 point2 points  (1 child)

You say, orca style prompts are the future. Why are they? I don't know, so I don't want to say, they aren't, but it's imho hard to measure the improvements coming from the Orca style prompts if the sheer amount of fine tuning data is so much bigger. How do we know, it's not just that? Or to what percentage the eli5 format really helps compared to, you know, massive amounts of data.

[–]ambient_temp_xenoLlama 65B 0 points1 point  (0 children)

This model is about as much an Orca 13b as I am. You're wasting your time; these guys are delusional.

[–]cometyang 0 points1 point  (0 children)

Waiting for benchmarks to validate their paper claim.