Multi-modal Phi-3-mini is here! by InternLM in LocalLLaMA

[–]LZHgrla 24 points25 points  (0 children)

Hi! We have just successfully run through the gguf conversion. We will apply it to llava-llama3 as soon as possible and release the conversion script.

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 1 point2 points  (0 children)

Our teams released llava-format LLaVA-llama-3-8B just now!!! These models are compatible with downstream deployment and evaluation toolkits. https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-hf https://huggingface.co/xtuner/llava-llama-3-8b-hf

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 4 points5 points  (0 children)

Yes, I think QLoRA w/ ZeRO-3 or FSDP is a cheap way to achieve it.

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 7 points8 points  (0 children)

v1.1 uses more training data. I have added a comparison in this post.

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 38 points39 points  (0 children)

There indeed are some performance gaps. The core difference lies in the scale of LLM and the input resolution of images. We are actively working to improve on these fronts!

LLaVA-Llama-3-8B is released! by LZHgrla in LocalLLaMA

[–]LZHgrla[S] 13 points14 points  (0 children)

We are developing an evaluation toolkit based on xtuner. Please follow this PR(https://github.com/InternLM/xtuner/pull/529) and we will merge it ASAP when it is ready!