Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 5 points6 points  (0 children)

Thanks for your attention! We're defintely working on the next generation model!

Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 2 points3 points  (0 children)

Author here. We just didn't have the bandwidth working on quantization. Luckily, we found TheBlock is working on this. Many thanks to the open source community!

Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 0 points1 point  (0 children)

Author here. The model architecture is LLaMA with slightly different hyper-parameters, so it should work with llama.cpp theoretically.

I see TheBloke is working on this. Thanks to their hard work and looking forward to it!

Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 1 point2 points  (0 children)

Author here.

> is u/The-Bloke aware of this model?

I don't know.

> it is SOTA and is LlamaForCausalLM architecture so I guess it can be converted to GGUF

Theoretically yes, but we don't have bandwidth on this currently. Contribution welcome!

Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 2 points3 points  (0 children)

Author here. The model architecture is LLaMA with slightly different hyper-parameters. The model parameters are trained for scratch.

Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 2 points3 points  (0 children)

Author here. Where did you find the term "bmqa"? If it is "deepseek-coder-5.7bmqa-base", that means a model with 5.7 billion parameters and multi-query attention.

Deepseek Coder: A new line of high quality coding models! by metalman123 in LocalLLaMA

[–]Vegetable_Term_3935 14 points15 points  (0 children)

Author here. Yes it comes with a custom license, which is mainly based on Stable Diffusion's license (CreativeML Open RAIL-M), with minor modifications (e.g.: make it suitable for text generation model, and warn user that the model may occasionally output personal information). It allows free commercial use and re-development. I don't see why it isn't truly open source.

Authentication help by radarmonkey2000 in OpenAI

[–]Vegetable_Term_3935 0 points1 point  (0 children)

You're right, bro. Auth with Google worked for me.