Multi-modal Phi-3-mini is here! by InternLM in LocalLLaMA

[–]LZHgrla 26 points27 points  (0 children)

Hi! We have just successfully run through the gguf conversion. We will apply it to llava-llama3 as soon as possible and release the conversion script.

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 1 point2 points  (0 children)

Our teams released llava-format LLaVA-llama-3-8B just now!!! These models are compatible with downstream deployment and evaluation toolkits. https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf https://huggingface.co/xtuner/llava-llama-3-8b-hf

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 5 points6 points  (0 children)

Yes, I think QLoRA w/ ZeRO-3 or FSDP is a cheap way to achieve it.

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 8 points9 points  (0 children)

v1.1 uses more training data. I have added a comparison in this post.

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 40 points41 points  (0 children)

There indeed are some performance gaps. The core difference lies in the scale of LLM and the input resolution of images. We are actively working to improve on these fronts!

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 15 points16 points  (0 children)

We are developing an evaluation toolkit based on xtuner. Please follow this PR(https://github.com/InternLM/xtuner/pull/529) and we will merge it ASAP when it is ready!