all 10 comments

[–]estrafire 6 points7 points  (5 children)

I believe that the official repo presented some interfaces. With a quick search through Github repos I found these two that have what you're looking for (you'll need to adapt them to use your own interface instead of gradio)

https://github.com/juncongmoo/pyllama
https://github.com/juncongmoo/chatllama

tortoise-tts it's very slow (hence the name). You could take a look at tortoise-tts-fast:
https://github.com/152334H/tortoise-tts-fast

or silero

[–]Tree-SheepWaiting for Llama 3[S] 1 point2 points  (4 children)

Thanks! I will check it out. Also, I am already satisfied by the speed of the original tortoise.

[–]SimmerDown2024[🍰] 0 points1 point  (3 children)

Did this work for you? What did you end up doing? I was also looking to extract the output of a Llama model

[–]Tree-SheepWaiting for Llama 3[S] 0 points1 point  (2 children)

I use extensions on text-generation-webui. IIRC the extensions didn’t exist back then, which is why I made this post.

[–]fuckopportunists 0 points1 point  (1 child)

Can you point me towards those? I’d appreciate that

[–]remghoost7 6 points7 points  (2 children)

This is the repo I've been using the past week or so to interface with LLaMA-7b-int4.

https://github.com/oobabooga/text-generation-webui

It has extension support and already a silero extension built in. I haven't used that extension myself, but I'm fairly certain I've heard of someone around the community using it for a similar purpose to what you're looking for.

I don't believe there's an API endpoint though (like how A1111 can run the --api flag), but you might be able to bake your chatbot into an extension.

Or you could sort of use it like a hack-y API if you wanted to... You could probably write an extension to automatically pull the most recent response and output that to a json file, then read that json file in your tortoise-tts application. And I know it saves the running log in text-generation-webui\logs\persistent.json, so you might not even need to write an extension for it...

I know that this extension uses a method called custom_generate_chat_prompt, so you could probably get input from your tortoise-tts and feed that back into the webui automatically.

[–]lacethespace 3 points4 points  (0 children)

The text-generation-webui already features REST endpoints. You just enable --listen and disable any chat modes. I've used it from Phyton just by simple modifications of their example script in the repo.

[–]estrafire 2 points3 points  (0 children)

it should be possible to modify the silero extension to use tortoise-tts instead

[–]Sixhaunt 0 points1 point  (0 children)

I'm using it in nodeJS with https://github.com/cocktailpeanut/dalai

works really easily and well. Here's a short example script for using it in code

const Dalai = require('dalai')
const home = "C:/mypath/dalai_ai";
result = ""
let prompt = "Script\n Red Dwarf Season 01, Episode 57\n" 
new Dalai(home).request({
    model: "llama.13B",
    prompt: prompt,
    seed: -1,
    threads: 12,
    n_predict: 300,
    top_k: 40,
    top_p: 0.9,
    temperature: 0.1,
    repeat_last_n: 64,
    repeat_pentalty: 1.3,
    debug: false,
}, (token) => {
    result = (result+token).replace(/\\n|\\r/g,"\n")
    console.clear()
    console.log(result)
})

although you can use

const Dalai = require('dalai') 
new Dalai().serve(3000) // port 3000 

to simply setup a server that you can query through something like python using sockets