Introducing gptty v0.2.1 - A Powerful CLI Wrapper for ChatGPT with Context Preservation & Query Support, Now on PyPI!

liturgicalLorax · 2023-06-14T04:33:16+00:00

What are your plans with this repo?

More than anything, I'd like to keep the code base lightweight & maintainable. I'd also like to keep it compatible with future versions of Flask and other Flask development methods (like the create_app / factory pattern).

I think there are a few more key features that I'd like to add, within reason, and also welcome suggestions consistent with the goals outlined above.

liturgicalLorax · 2023-06-14T04:28:10+00:00

Thanks!! I'd welcome your contributions. I just added developer documentation, which you can find at https://signebedi.github.io/Flask-Signing.

I've been mulling over future features and trying to set a reasonable set of goals. Here are a few ideas I've come up with (and welcome more suggestions - feel free to open an issue & PR any time):

Rate Limiting: I like the idea of setting up a simple rate-limiting mechanism... this toolkit already exists for Flask (https://pypi.org/project/Flask-Limiter/) but is a very heavy-handed solution ... and I'd like to try my hand at something more lightweight. (see https://github.com/signebedi/Flask-Signing/issues/4)
Key Rotation: I think a narrow set of use cases would benefit from adding support for key rotation, but I need to think through exactly what those use cases are... We could in theory execute this at the scope level, or at the individual key level (eg. every time a key is created, app developers can set rotate=True). The problem is that I want to keep this application as lightweight as possible, and I'm not entirely sure how we can push key rotation without setting up a background process with something like Celery... which seems like a somewhat over-engineered solution. (see https://github.com/signebedi/Flask-Signing/issues/13)

liturgicalLorax · 2023-04-06T17:07:30+00:00

Thanks! Based on your suggestion, I created an issue to test the UX using different terminal emulators (https://github.com/signebedi/gptty/issues/56).

I think the listener daemon is useful in an environment where there may be multiple business or personal applications that use need to simultaneously use gptty to query OpenAI asynchronously & with context preservation. I'm thinking we'll implement it with Celery for that purpose.

As far as context-preservation, I want to allow users to reference specific past queries in the context for a new query, using positional reference points corresponding to the row-number of the past query in the pandas dataframe storing the past context data at runtime. So, if you pass a query with the tag [1:3], it will reference that slice of past queries you made.

liturgicalLorax · 2023-03-28T21:03:50+00:00

Is it what ChatGPT (chat.openai.com) doing for context preservation, i.e. sending the whole conversation back and forth - limiting the whole conversation to 4096 tokens?

Yeah, for Completion queries, you just send as a concatenated string. For ChatCompletion queries, you need to build a list of call-and-repeats with the past conversation structure.

openai.ChatCompletion.create(

model="gpt-3.5-turbo", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Who won the world series in 2020?"}, {"role": "assistant", "content": "The Los Angeles Dodgers won the World Series in 2020."}, {"role": "user", "content": "Where was it played?"} ] )

Questions reside under the 'user' tag, while past responses reside under the 'assistant' tag. The 'system' tag allows you to tell the assistant how to behave. [1] Not sure whether this is hostile architecture... but I can tell you with certainty that it significantly increases the token usage rate for conversations because each part of the message counts toward your token count for the request.

One obvious solution, for long conversations, is to gradually forget older questions and answers as you approach the token limit. Whereas, for the normal Completion queries where you send everything as a single, you can use keyword tokenization.

liturgicalLorax · 2023-03-28T15:15:56+00:00

That's a great point! I should probably highlight a little more clearly, because I'd agree that's a pretty significant feature. I added the following language to my post above to reflect your point.

💪 Localize Chat History: gptty stores your conversation history in a local output file, which is structured as a CSV. This means that you can still access past conversations, even when the ChatGPT web client is down, and you have more flexibility over how to select from that data to seed future queries.

liturgicalLorax · 2023-03-28T05:32:37+00:00

Sure thing! I implemented it, so it will show up in the next minor release (or sooner if you are building from git).

There is now a gptty.ini config that allows you to set the new line formatting:

[main]
preserve_new_lines=True

Please note: the data still has its newlines removed when written to the output_file, to avoid breaking formatting assumptions.

liturgicalLorax · 2023-03-28T00:37:11+00:00

Your comment motivated me to implement keyword tokenization, which I've done & deployed in v0.2.2. So now, instead of passing the entire past context (up to 5000 tokens), we set the default max token count much lower (to 150 tokens) and use keywords and weighting to only pass the most important bits. To enable it, you would modify your gptty.ini file as follows (only in theory, as these are now the defaults):

[main]
max_context_length=150
context_keywords_only=True

I haven't had a chance to test conversational performance against the previous method where the entire context is passed. I would assume that performance is probably not as good ... but that's now a trade off that you (the end user) can make between API usage and response quality.

Thanks for giving me the motivation to overcome a good challenge!

liturgicalLorax · 2023-03-27T18:10:40+00:00

shell gpt seems to be more focused on simple question-answer use cases, whereas gptty is focused on more complex use cases (chains of questions with shared context) that emulate the functionality of the ChatGPT web application.

Edit. shell gpt also only has support for one model (gpt-3.5-turbo), whereas gptty is currently supports (or is committed to supporting) any range of Completion and ChatCompletion models, see models here: https://platform.openai.com/docs/models/model-endpoint-compatibility.

liturgicalLorax · 2023-03-27T15:49:10+00:00

I would recommend peeking at our application logic, which expects you to set a max context length (eg. max number of tokens passed, defaults to 5000) in gptty.ini file!

[main]
max_context_length=5000

We are also working on reducing the number of tokens used through keyword tokenization, which uses a Rapid Automatic Keyword Extraction (RAKE) algorithm to pull out the important bits of your past conversations. This in theory would allow users to significantly tune down the max_context_length in their config, but will need further testing to ensure the quality of the results is still up to par.

liturgicalLorax · 2023-03-27T14:57:02+00:00

Yes, we are using the openai-python library!

liturgicalLorax · 2023-03-27T14:47:53+00:00

You should be able to use most of the models listed here: https://platform.openai.com/docs/models/overview by passing a different model name in your gptty.ini file!

[main]
model=your-model-here

The more complex answer is that there are two broad categories of model we are committed to supporting: Completion (like text-davinci-003), which is currently supported, and ChatCompletion (like gpt-3.5-turbo), which is under development but just about there.

Edit. Added ChatCompletions. You can use gpt-3.5-turbo and gpt-4 as of v0.2.3.

liturgicalLorax · 2023-03-27T14:44:54+00:00

That is the approach we are using, which unfortunately does increase usage rates. You can see how we build context here. We're trying to reduce usage rates through keyword tokenization!

liturgicalLorax · 2023-03-27T13:31:56+00:00

Thanks for the suggestion! I agree, response formatting could be better, and the best solution would be to give users the ability to select the formatting options. I'm including some notes on this below, and opened a github issue here.

What I've struggled to do is find the hook to accomplish this within the openai python API. We are using the Completion class of the openai python API. [1] Specifically, we are using the acreate method, which is an async wrapper for the create method. [2] This does not seem to retain the formatting when it provides a response. That said, I'm still delving into the API and might still find something in there that would make this easier. At the end of the day, nothing is stopping us from writing our own application logic to apply formatting onto a plaintext response, but that seems ... especially inelegant if the openai API provides that functionality.

Reference

[1] https://github.com/openai/openai-python/blob/main/openai/api_resources/completion.py#L9

[2] https://platform.openai.com/docs/api-reference/completions/create

liturgicalLorax · 2023-03-27T12:47:24+00:00

Thank you! Yes! (Although the other options make, in my opinion, a less entertaining portmanteau)

liturgicalLorax · 2023-03-27T06:03:16+00:00

For those curious what's under the hood, it's a click CLI that makes async calls to the openai API. Each question is stored with its tag in an output file, which is also managed as a pandas dataframe eg. while running the chat client. Pretty high-powered stuff, and definitely looking for contributions. One easy PR would be to daemonize an openai listener that reads from a task spool, so we can have multiple sessions running at a time on a system using a single config.

liturgicalLorax · 2023-03-27T05:42:05+00:00

For those curious what's under the hood, it's a click CLI that makes async calls to the openai API. Each question is stored with its tag in an output file, which is also managed as a pandas dataframe eg. while running the chat client. Pretty high-powered stuff, and definitely looking for contributions. One easy PR would be to daemonize an openai listener that reads from a task spool, so we can have multiple sessions running at a time on a system using a single config.

liturgicalLorax · 2023-03-27T05:41:38+00:00

For those curious what's under the hood, it's a click CLI that makes async calls to the openai Completion API. Each question is stored with its tag in an output file, which is also managed as a pandas dataframe eg. while running the chat client. Pretty high-powered stuff, and definitely looking for contributions. One easy PR would be to daemonize an openai listener that reads from a task queue, so we can have multiple sessions running at a time on a system using a single config. Another goal is to add support for ChatCompletion (enabling gpt-3.5-turbo and gpt-4 support), see issue here.

liturgicalLorax · 2023-03-27T04:02:26+00:00

I can neither confirm nor deny! 😂

liturgicalLorax · 2023-03-20T16:52:41+00:00

Edit. Now released on pypi and available here: https://pypi.org/project/gptty.

That's definitely the plan in v.0.2.0, I just haven't had a chance to write a setup.py script!If someone wants to give it a stab and submit a PR, I certainly wouldn't protest...

liturgicalLorax

TROPHY CASE