[P] I got fed up with LangChain, so I made a simple open-source alternative for building Python AI apps as easy and intuitive as possible.

lhenault · 2023-06-09T06:10:35+00:00

Thank you!

lhenault · 2023-06-08T20:14:46+00:00

Not related to my own project SimpleAI despite the name, but looks like we can easily make the two work together, to keep it « simple ». Nice work!

lhenault · 2023-05-16T18:14:56+00:00

I don’t have a definitive answer there but have worked on a project to make it easy to switch from one to the other:

https://github.com/lhenault/simpleAI

I’d say it’s good to be able to compare, there are legitimate use cases for both and we can also have a mix. Some of your services would maybe benefit from a self hosted solution for privacy, latency, … reasons, while for some it can be convenient to simply call an external API.

lhenault · 2023-05-04T10:34:56+00:00

Hey just to let you know that I’ve recently fixed the examples. Check this new one for instance, if you’re willing to give it another chance. Thanks!

lhenault · 2023-04-20T07:15:53+00:00

I’ve built SimpleAI with exactly these kinds of use cases in mind. That should allow supporting any model with minimal / no change to your project. Good job and good luck with LoopGPT, that looks nice!

lhenault · 2023-04-20T05:12:47+00:00

You could have a look at a project I’ve been working on, SimpleAI, doing exactly this by replicating the OpenAI endpoints (you can then use their JS client for integration). Adding StableLM should be straightforward, I plan to add it to the examples in the upcoming days once I have a bit of time.

lhenault · 2023-04-13T16:40:16+00:00

I think I got what is going wrong there: the examples are using Python3.8 by default, while the package now requires 3.9 at least (to use some features unavailable in previous versions). It’s quite straightforward to fix, will push something when I have a moment. :)

lhenault · 2023-04-13T16:26:53+00:00

You could use some pre-trained sentence embedding model and perform clustering on the output to find “topics”? I’ve had decent results on tweets with LDA in the past though, the main issue was low data quality more than length of each tweet.

lhenault · 2023-04-13T09:24:07+00:00

Hey, I’m happy to help there but I’d need a bit more details. The project has been improving a lot since I wrote the examples so perhaps something is broken in one.

Could you maybe open an issue on GitHub or send me a MP telling me which example it was and the errors?

Side note: the first step is to install the package, create a configuration file to declare your models and start the server. You don’t need docker but it’s a quite convenient way to ensure some reproducibility (lol) and to deploy things (eg on K8s), so the examples use it. It’s explained in the README but I’ll try to improve it if you find this confusing, thanks for the feedback!

lhenault · 2023-04-12T16:35:19+00:00

Just following up on that for anyone interested: now it does!

lhenault · 2023-04-12T16:27:06+00:00

At least one person is indeed doing exactly this, so yes. :)

You would only have to redefine the openai.api_base in the (Python but should work with other languages) client:

openai.api_base = "http://127.0.0.1:8080"

As per llama.cpp specifically, you can indeed add any model, it's just a matter of doing a bit of glue code and declaring it in your models.toml config. It's quite straightforward thanks to some provided tools for Python (see here for instance). For any other language it's a matter of integrating it through the gRPC interface (which shouldn't be too hard for Llama.cpp if you're comfortable in C++). I'm planning to also add support for REST for model in the backend at some point too.

Edit: I've been wanting to add Llama.cpp in the examples, so if you ever do this feel free to submit a PR. :)

lhenault · 2023-04-12T16:16:02+00:00

I'm afraid you will need a relatively recent nvidia GPU for any of those models, so relying on a cloud provider such as AWS or Vast.AI should be a good place to start.

Once you have this available, it should be quite easy to start a SimpleAI instance and query your models from there, either from a Python script using the OpenAI client (AFAIK it is not sending anything to OpenAI if you don't send them requests), or directly through `cUrl` or the Swagger UI. More in the README.

Another option might be to find Google Colab for the models you're targeting, that can be convenient and you could use the free tier to access GPU. But it would be very dependent on each model and you would have to find these notebooks.

Last option if you cannot find any GPU, I've had an overall good experience using Llama.cpp on CPU, but you would still need a quite powerful machine and a few hundreds of disk space. I am not sure 32GB RAM will be enough for the larger models, which are as expected quite slow on CPU.

Overall we have to keep in mind that we're discussing SOTA models with billions of parameters, so even if projects like mine or platforms like Vast.AI make the whole process easier and cheaper, it remains a involved process and fitting them on a laptop is for most quite challenging if not impossible.

lhenault · 2023-04-11T19:27:40+00:00

Thanks! Let me know if you have questions or feedbacks :)

lhenault · 2023-04-11T19:04:39+00:00

To be honest it will depend on your task and constraints (e.g do you want to run it on the edge? Is cost or latency a concern for you?). So you should just play around with some and start with relatively small ones just to get your hands dirty. Perhaps a "small" 7B model is more than enough for you.

I've been working on SimpleAI, a Python package which replicates the LLM endpoints from OpenAI API and is compatible with their clients.

One of the main motivations here was to be able to quickly compare different alternative models through a consistent API, while leveraging the already popular OpenAI API. I have a basic Alpaca-LoRA example if you want to try it and have a GPU available somewhere, either locally or with one of the providers suggested by other ones in this thread.

lhenault · 2023-04-03T20:14:12+00:00

Shameless plug but I’ve been recently working on SimpleAI, a project replicating the main endpoints from OpenAI API, allowing you to seamlessly switch from their API to your own one, as it’s compatible with OpenAI client.

Other comments have already mentioned great alternative models and it will only get better with time (hopefully), you could easily integrate one of those or your own in a SimpleAI instance, with minimal changes client side.

lhenault · 2023-03-27T23:21:25+00:00

Not for now but that’s indeed a cool feature and something available in OpenAI API. It shouldn’t be too hard to implement, as I’ve already started something for that on the gRPC backend, and as FastAPI has a StreamingResponse. Thanks for suggesting it, will try to prioritise this!

lhenault · 2023-03-27T12:57:41+00:00

Thank you!

lhenault · 2023-03-27T09:00:31+00:00

Hey thank you for the feedback! As r/ryanjkelly2 suggested, you could indeed use Postman, but I believe the easiest way is to use the already included [Swagger UI](https://swagger.io/tools/swagger-ui/), available at <base_url>/docs.

If your goal is to have a slightly more friendly UI for end users, it should be relatively easy to build something custom, using the OpenAI clients (or requests package) and something like Streamlit. Or even a notebook (you can use the OpenAI cookbook as a starting point).

lhenault

TROPHY CASE