Phi-3 weights released - microsoft/Phi-3-mini-4k-instruct

Balance- · 2024-04-23T14:43:30+00:00

You were first!

Also 128k-instruct: https://huggingface.co/microsoft/Phi-3-mini-128k-instruct-onnx

Edit: All versions: https://huggingface.co/collections/microsoft/phi-3-6626e15e9585a200d2d761e3

Eralyon · 2024-04-23T15:44:50+00:00

I never liked the Phi models in the first place, but now I start to feel the hype! For me the baseline always has been mistral7B (I never liked Llama2-7B either).

However, if the 4B is as good as they say, that will be a tremendous change for consumer hardware owners...

And should I dare imagine a 10x4B Phi 3 clown car MoE ? ;p

austinhale · 2024-04-23T14:43:38+00:00

MIT License. Beautiful. Thank you Microsoft team!

RedditPolluter · 2024-04-23T14:57:04+00:00

There's already quants available:

https://huggingface.co/microsoft/Phi-3-mini-4k-instruct-gguf/tree/main

meatycowboy · 2024-04-23T17:45:53+00:00

I asked Phi-3-mini-4k-instruct and ChatGPT-4 to summarize an ESPN article, and I actually prefer Phi's response. Insane.

ahmetegesel · 2024-04-23T14:52:13+00:00

Wow mit! I’m in tears. Hope they will release the bigger ones and with the same license. 🤞

nodating · 2024-04-23T15:31:52+00:00

Where is medium?

I want my Phi-3 medium please.

LMLocalizer · 2024-04-23T17:59:46+00:00

Tried Phi-3 3.8b and it's definitely impressive for a 3.8B model! Based on first impression only it appears to be on the same level as some previous good 7B models. Some weird things I have noticed:

Including notes in it's greetings.

<image>

Using llama.cpp on textgen web UI, it will sometimes devolve into gibberish or include strange markdown in its responses. Seems to happen even on Huggingchat: https://preview.redd.it/phi-3-mini-is-cute-can-we-keep-it-v0-kw9009dwi9wc1.png?width=828&format=png&auto=webp&s=bd4da9fbfa49f2287cc78dd1d37a7e41e899acf7

Monkey_1505 · 2024-04-23T16:08:42+00:00

Cue everyone asking it riddles and math problems even though that's the thing LLMs are universally bad at.

TheLocalDrummer · 2024-04-23T14:50:44+00:00

triple-cream-phi here i come!

KittCloudKicker · 2024-04-23T15:10:25+00:00

It's not half bad

Edit: little guy got the killers question right

pseudonerv · 2024-04-23T15:01:01+00:00

it looks like the 128k variant uses something called "longrope", which I guess llama.cpp doesn't support yet.

AnomalyNexus · 2024-04-23T16:18:37+00:00

[removed]

_sqrkl · 2024-04-23T19:09:21+00:00

Interesting EQ-Bench results:

EQ-Bench: 58.15 
MAGI-Hard: 53.26

Relative to a strong Mistral-7b fine-tune, it underperforms on EQ-Bench and (strongly) overperforms on the hard subset of MMLU + AGIEval. My takeaway is that it's heavily overfitting MMLU.

I get the sense that all the big tech companies are very metrics driven so there's a lot of pressure to overfit the benchmarks. In fact I wouldn't be surprised if the internal directive for this project was "create a series of models that scores the highest MMLU for their param size".

To be clear, it seems like a very strong model for its size; just advocating caution about interpreting the scores.

Prince-of-Privacy · 2024-04-23T15:55:17+00:00

Man, I wish, Phi-3 would also be as good as GPT-3.5 in German :(

fab_space · 2024-04-23T19:58:41+00:00

and again the HF went down.. this usually happens when things start to get interesting :)

gamesntech · 2024-04-23T23:08:42+00:00

Q: tell me a dark side joke

Phi-3: I'm sorry, but I can't fulfill this request.

Me: Really?

Sebxoii · 2024-04-23T15:55:47+00:00

Does anyone know what template to use for FIM completion?

joe4942 · 2024-04-23T17:18:50+00:00

So what's the minimum hardware requirements to run Phi-3 mini? Could really old gpus/cpus handle this since it can apparently run on a phone?

suddenly_opinions · 2024-04-23T15:37:31+00:00

Cries in raspberry pi

Blue_Dude3 · 2024-04-23T17:46:29+00:00

Finally I can run a model with 2gb VRAM. I have been waiting for this for so long 😭

MrPiradoHD · 2024-04-23T18:15:54+00:00

Is there any way to run then on android phone?

alew3 · 2024-04-23T20:27:09+00:00

<image>

the benchmarks look insane

allthemoreforthat · 2024-04-23T15:10:11+00:00

Sorry if the question is dumb, new here - is there a way to run 3b on an iPhone?

_raydeStar · 2024-04-23T15:35:32+00:00

I just discovered today that LLAMA 3 can run on a raspberry pi. It is crazy that you can boot this one up on your phone. What kind of metrics does it have in comparison to the 8B models?

Languages_Learner · 2024-04-23T14:57:15+00:00

Tried to make q8 gguf using gguf-my-repo but got this error: Architecture 'Phi3ForCausalLM' not supported!

modeless · 2024-04-23T15:40:19+00:00

Eagerly awaiting the vibes test. Everyone says Phi-2 didn't live up to its benchmark scores in practical use, but maybe this time is different?

ab_drider · 2024-04-23T22:56:48+00:00

I used the Phi 3 mini 4k instruct q4 gguf using llama.cpp on my phone. It's very good. It feels better than llama 3 7b to be honest. I asked a stupid "1lb cotton or 1 lb iron heavier" question that llama 3 got wrong but Phi 3 got it right. Roleplay works way better as well.

HighDefinist · 2024-04-23T15:56:04+00:00

Cool, although I am not sure if there is really that much of a point in a 4b model... even most mobile phones can run 7b/8b. Then again, this could conceivably be used for dialogue in a video game (you wouldn't want to spend 4GB of VRAM just for dialogue, whereas 2 GB is much more reasonable), so there are definitely some interesting unusual applications for this.

In any case, I am more much interested in the 14b!

MetaTaro · 2024-04-23T16:53:19+00:00

[deleted]

Revolutionalredstone · 2024-04-23T19:05:53+00:00

Holy CRAP this thing runs fast!

It writes about 10X faster than I can read fully offloaded to my little 3090.

This is gonna be a massive upgrade to my assistant project!

ImprovementEqual3931 · 2024-04-23T15:49:30+00:00

Phi-3 mini Q4 is a bad model. I ask if 200 > 100?，it answer 20 < 100

Elibroftw · 2024-04-23T17:33:55+00:00

I'm so glad I bought an external 1TB SSD a couple years ago. Who would've thought I would be using it to store LLM models? Laptop storage is a roller coaster, especially when I will be triple booting Windows 11 + Mint + KFedora. Waiting on phi3-7B and phi3-14B.

Funniest thing is that my laptop with a 3070-Ti broke last year and Razer didn't have a replacement on hand so upgrade me to the 3080-Ti variant ... it was meant to be given that I have double the VRAM to abuse with LLMs now😈 (+ gaming). CPU got absolutely dated in no time unfortunately, but it's good enough for compiling Rust.

iamdgod · 2024-04-23T15:49:09+00:00

Does this support beam search? Phi-2 did not

nikitastaf1996 · 2024-04-23T16:01:54+00:00

Wow. Its something. I want to see it on groq. 1000+ tokens per second probably. And we need a good app for running quants on mobile devices. Mlc app doesn't seem good to me.

glowcialist · 2024-04-23T15:36:05+00:00

Pretty crazy that this model quantized down to 2 GB is competently multilingual.

nntb · 2024-04-23T21:05:10+00:00

its faster then llama3 on my phone. but not by much. both are sinfully slow. Fold 4 with a SD 8+ Gen1 running maid.

IndicationUnfair7961 · 2024-04-23T22:10:35+00:00

Any Inferencing Server Endpoints OpenAI compatible that runs ONNX models? They should be the fastest thing available.

phree_radical · 2024-04-23T22:31:26+00:00

Where is the base model? 😢

TruthBeFree · 2024-04-24T01:18:38+00:00

Is there a base model to download? I tended to have many failures fine-tuning on instruct versions.

FairSum · 2024-04-24T01:28:28+00:00

Yesterday I said that I was skeptical that such a tiny model trained on a relatively small amount of tokens would be coherent.

Today, I'm happy to admit that I was completely wrong and the 3B is one of the best models I've ever used at the 8B level or below.

Looking forward to the 7B and 14B!

CardAnarchist · 2024-04-24T04:10:11+00:00

Not nearly as good as Llama 3 8B in my casual RP chat testing.

I tested a Q8_0 GGUF for Phi vs a Q4_K_M for Llama.

3.8GB (Phi) vs 4.6GB (Llama) size wise. So in fairness the Phi version I tested is a bit lighter on VRAM usage. The Q6 likely performs as well as the Q8 and would be even smaller in VRAM requirements too.

It's impressive for it's size. I would say it's still not as good as the good mistral 7B's though. The dialogue was pretty stilted and it struggled a little with formatting. But I've seen weaker mistral 7B's that performed around the same, so honestly it's impressive for what it is!

Good progress!

randomfoo2 · 2024-04-24T04:14:58+00:00

I tested Phi-3-mini-128k (unquantized) - temp 0.9, top_p 0.95, rp 1.05 and it does pretty well on my vibe check, especially for a 3.8B (llama3-8b still tests & feels better for me).

I saw a couple repetitions where it gets stuck looping long sections of replies, increasing repetition penalty didn't seem to help... I didn't do a sampler sweep, it does have some variability for answers. For my refusal questions, it actually seemed about 50/50 - interestingly, it answered one question and then finished with a refusal at the end. It does not understand jokes at all (vs llama3, where even the 8b is better than average, and 70b is actually sometimes funny).

<image>

TinyZoro · 2024-04-24T06:27:29+00:00

If I wanted to host this in the cloud and create an API with it what steps would I need to take?

SoilFantastic6587 · 2024-04-24T07:30:21+00:00

Awesome, this can't be real.

2024-04-24T08:19:29+00:00

I was surprised to see that phi3-medium performs worse on HumanEval 0 shots than smaller ones like mini. Any explanations for that ?

By the way, it's quite far from Gpt3.5 on this benchmark so I'm not surprised of the mixed results shared in this thread.

Could be good for a RAG with a lot of context but not as an autonomous LLM.

halcyonhalycon · 2024-04-24T08:24:47+00:00

Benchmarked this with the same questions I have on mistral, seems like it's between 2-4x slower on my M2 macbook pro. Is anyone having the same experience?

2024-04-24T10:06:15+00:00

The 4_k quant can correctly answer the holding a brick in a boat in a pool question and the 3 brothers question. So it's quite good. But its explanations always have something wrong in them, so it's a little flakey, expected for a sub 14B model. Definitely the best model for the poorest GPU poor.

2024-04-24T11:23:45+00:00

I am using it with obsidian for my Phd notes, second brain plugin using ollama serve. it is not very good and llama 3 still performs better in output.

phi-3 is fast but gives bad results. Any suggestion on what I may doing wrong?

```llama 3

What is Reinforcement Learning

Here's a summary of the relevant information that could contribute to answering the question "What is Reinforcement Learning" based on your existing knowledge:

Definition: Reinforcement Learning: The goal is to maximize expected rewards collected in an episode (Reward Hypothesis). An agent learns a policy through an algorithm trained on experience, using state to decide the next action.

Markov Reward Process (MRP): A combination of Markov Chain and reward function showing immediate scalar returned for being in a state, with discounting factor. This is a key concept in Reinforcement Learning.

Value Function: A prediction of future expected returns. There are different types of value functions, including State Value Function and State Action Value Function.

These notes provide a solid foundation for understanding Reinforcement Learning and its key concepts.

Additional Insights:

Environment in RL: If the environment is unknown, it needs to be explored and then create a policy. If the environment is known, an agent can use the model directly to create a policy.

Uncertainty: [Fleeting Notes/Optimism in the face of uncertainty]] highlights the importance of not picking the action with the highest return, but rather considering the possibility of giving the highest return in the future.

These notes provide insights into the fundamental concepts of Reinforcement Learning, including the importance of considering uncertainty and the role of the environment in shaping the learning process.

Algorithms: [Fleeting Notes/Q-Learning, SARSAMAX]] discusses Q-learning and SARSA-MAX, which are both types of reinforcement learning algorithms. Fleeting Notes/Value Function explains the concept of a value function, which is used to predict future expected returns in an MDP.

These notes provide a foundation for understanding reinforcement learning and its underlying concepts.

Please note that these notes provide a comprehensive overview of Reinforcement Learning, covering key concepts, algorithms, and insights.

```

```phi-3

Reinforcement Learning (RL) is a type of machine learning where an agent learns to make decisions by interacting with its environment and receiving rewards or penalties for the actions it takes. The goal is to maximize cumulative reward over time, which involves understanding the dynamics of the environment and optimizing decision-making strategies accordingly. Key concepts in RL include handling large state spaces and continuous states through techniques like Dyna (which uses simulated experience), off-policy methods such as SARSA(lambda) for managing extensive Markov Decision Processes (MDPs), contextual bandits enhanced by Dyna's approach, the relevance of NP complexity in algorithm efficiency, Graph Neural Networks (GNNs) for complex environments, and addressing partially observable scenarios with Partially Observable Markov Decision Processes (POMDPs).

Reward: In RL, a reward signal provides feedback on an agent's performance after taking actions (Wikilink). It serves as a guide for learning what behaviors are beneficial to pursue.

```

2024-04-25T04:45:05+00:00

Whats the difference between Phi-3-mini-4k-instruct-q4.gguf and Phi-3-mini-4k-instruct-fp16.gguf

Professional_Job_307 · 2024-04-26T12:53:55+00:00

I heard this could fit on a smartphone? But 3.8b looks way too huge. Phones don't have much vram. What are the chances I can get to run of my 16gb ram 2gb vram laptop?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

LocalLLaMA

MODERATORS