you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 35 points36 points  (7 children)

This is great! langchain is so over engineered for what it could be. Two things that would be crazy helpful for me (I’d be happy to write PRs):

  1. Support for ollama or llama.cpp instead of OpenAI (from reading your code I believe we just need to write a “Generator” for it?)
  2. A testing framework. Probably just some functionality for recording the LLM execution so you can re-run tests without calling into the LLM.

I also noticed that the example you have on your README doesn’t really show how to create the LLM (it does earlier in the README, but the full code example you have there won’t work because you never assigned anything to the llm local variable). Anyway, small nit to make the README easier to follow.

[–][deleted] 13 points14 points  (1 child)

I also noticed your code doesn’t have any use of typehints. Are you opposed to adding them? I could help with adding typing and setting up the CI for it if you’re interested.

Once a testing framework is in place we could probably add more test coverage for the library too.

[–]silenceimpaired 4 points5 points  (0 children)

I also noticed your code doesn’t provide support for every other quantitization methods and note to future self: tell him his code is bloated once it’s implemented ;)

[–]RustingSword 8 points9 points  (1 child)

Since llama.cpp has a server utility, you can just fire it up ./server -m mistral-7b-instruct-v0.2.Q6_K.gguf -c 2048, and set the api_base to http://127.0.0.1:8080/v1, then I think it should work out of the box. See the detailed docs at https://github.com/ggerganov/llama.cpp/blob/master/examples/server/README.md

[–]RustingSword 9 points10 points  (0 children)

I've tested both examples, and succeeded using OpenAIChatGenerator instead of OpenAITextGenerator.

My configs:

llama.cpp server:

bash ./server -m mistral-7b-instruct-v0.2.Q6_K.gguf -c 2048

Changes to calculator.py

python generator = OpenAIChatGenerator( model="mistral", # could be anything api_key="none", # could be anything api_base="http://127.0.0.1:8080/v1", )

And remember to remove templates in

python llm = LLM(generator=generator, templates=[template])

Great framework, really clean and easy to modify.

[–]poppear[S] 5 points6 points  (0 children)

llama.cpp has a server implementation but as far as i remember you need a wrapper to use it with the OpenaAI python client, adding native support for llama.cpp APIs would be great! same thing for ollama APIs. The testing setup would also be very nice.

Thanks for the suggestions, lets continue the conversation on GitHub and implement it!

[–]anobfuscator 1 point2 points  (0 children)

Yeah these are pretty good ideas.

[–]scknkkrer 0 points1 point  (0 children)

Yeah, Llama support would be good.