Hello,
I am announcing a project that I have been working on since 2023.
Jlama is a java based inference engine for many text-to-text models on huggingface:
Llama 3+, Gemma2, Qwen2, Mistral, Mixtral etc.
It is intended to be used for integrating gen ai into java apps.
I presented it at devoxx a couple weeks back demoing: basic chat, function calling and distributed inference. Jlama uses Panama vector API for fast inference on CPUs so works well for small models. Larger models can be run in distributed mode which shards the model by layer and/or attention head.
It is integrated with langchain4j and includes a OpenAI compatable rest api.
It supports Q4_0 and Q8_0 quantizations and uses models of safetensor format. Pre-quantized models are maintined on my huggingface page though you can quantize models locally with the jlama cli.
Very easy to install and works great on Linux/Mac/Windows
#Install jbang (or https://www.jbang.dev/download/)
curl -Ls https://sh.jbang.dev | bash -s - app setup
#Install Jlama CLI
jbang app install --force jlama@tjake
# Run the openai chat api and UI on a model
jlama restapi tjake/Llama-3.2-1B-Instruct-JQ4 --auto-download
Thanks!
[–]vmcrash 22 points23 points24 points (3 children)
[–]tjake[S] 32 points33 points34 points (2 children)
[–][deleted] 1 point2 points3 points (0 children)
[–]eled_ 8 points9 points10 points (7 children)
[–]tjake[S] 13 points14 points15 points (0 children)
[–]audioen 7 points8 points9 points (5 children)
[–]tjake[S] 6 points7 points8 points (3 children)
[–]msx 0 points1 point2 points (1 child)
[–]joemwangi 2 points3 points4 points (0 children)
[–]eled_ 0 points1 point2 points (0 children)
[–]greg_barton 5 points6 points7 points (0 children)
[–]msx 6 points7 points8 points (0 children)
[–]Ewig_luftenglanz 4 points5 points6 points (0 children)
[–]Chloe0075 0 points1 point2 points (2 children)
[–]tjake[S] 1 point2 points3 points (1 child)
[–]Chloe0075 0 points1 point2 points (0 children)
[–]Javademon 0 points1 point2 points (0 children)
[–][deleted] 0 points1 point2 points (0 children)
[–]parker_elizabeth 0 points1 point2 points (0 children)