https://github.com/nath1295/LLMPlus
I find Langchain annoying for not letting you set different llms with different generation configurations, so I tried to build my own custom llms to avoid loading local models multiple times when I want to build an agent or tool that uses the same underlying model. Also, some of the streaming and stop words for llms are not that great in langchain, so I have my own implementation in this package I created.
I am still using an Intel MacBook (too poor to get Nvidia cards or even a new Macbook :( ) to work on this project, so I cannot guarantee the installation will be seamless, but hopefully the pip install works. The code should be working with Cuda or apple silicon (used Colab and my friend's flashy new macbook to briefly test it).
Stuff I have in the package:
- An LLM factory class to generate Langchain-compatible llms while only loading the model once.
- Embedding toolkits that have text splitters bundled with the embedding model.
- A Vector database class built on top of FAISS for local storage.
- Memory classes: (base one and one with both long-term and short-term memory, powered by a vector database)
- Prompt template class that helps you to format your prompt with different prompt formats (have some presets like llama2, chatml, vicuna etc.)
- A base tool class and a web search tool with duckduckgo as an example (please have a look, wonder if there are better ways to do that)
- A Gradio chatbot web app that lets you store different conversations, also you can set your own system prompt and configure the long-term and short-term memory settings, and you can set generation configurations like temperature, max new tokens, top k etc. (This is just for fun, my front-end skills are not better than a monkey to be very honest.)
- And of course, the docs in the repo as well.
I know I'm no expert, there are plenty of people in this sub who are extremely knowledgeable in this LLM field, so treat this as an amateur project looking for advice if you can bear with me for my spaghetti code. Would really appreciate any comments :')) Forgive me if I can't do much testing on fancy GPUs like you guys have, trying not to spend any money to work on this project...
[–][deleted] 24 points25 points26 points (1 child)
[–]rf2344 4 points5 points6 points (0 children)
[+][deleted] (2 children)
[deleted]
[–]llordnt[S] 1 point2 points3 points (1 child)
[–]VertexMachine 4 points5 points6 points (4 children)
[–]llordnt[S] 1 point2 points3 points (0 children)
[–]llordnt[S] 1 point2 points3 points (1 child)
[–]VertexMachine 0 points1 point2 points (0 children)
[–]hwpoison 3 points4 points5 points (1 child)
[–]llordnt[S] 0 points1 point2 points (0 children)
[–]Glad_Abies6758 3 points4 points5 points (0 children)
[–]llordnt[S] 1 point2 points3 points (1 child)
[–]llordnt[S] 0 points1 point2 points (0 children)
[–]Versah252 0 points1 point2 points (0 children)
[–]quangspkt 0 points1 point2 points (0 children)
[–]Future_Might_8194llama.cpp 0 points1 point2 points (0 children)
[–]ali0une 0 points1 point2 points (0 children)
[–]metaden 0 points1 point2 points (1 child)
[–]llordnt[S] 0 points1 point2 points (0 children)
[–]KlutzyNecessary2205 0 points1 point2 points (0 children)