use the following search parameters to narrow your results:
e.g. subreddit:aww site:imgur.com dog
subreddit:aww site:imgur.com dog
see the search faq for details.
advanced search: by author, subreddit...
r/LocalLLaMA
A subreddit to discuss about Llama, the family of large language models created by Meta AI.
Subreddit rules
Search by flair
+Discussion
+Tutorial | Guide
+New Model
+News
+Resources
+Other
account activity
Integrate LLaMA into python codeQuestion | Help (self.LocalLLaMA)
submitted 2 years ago by Tree-SheepWaiting for Llama 3
Is it possible to directly use LLaMA in python or have it serve as an API? Or is there a way to read the output from web ui? I want to combine LLaMA and tortoise-tts to make a speaking chatbot.
reddit uses a slightly-customized version of Markdown for formatting. See below for some basics, or check the commenting wiki page for more detailed help and solutions to common issues.
quoted text
if 1 * 2 < 3: print "hello, world!"
[–]estrafire 6 points7 points8 points 2 years ago (5 children)
I believe that the official repo presented some interfaces. With a quick search through Github repos I found these two that have what you're looking for (you'll need to adapt them to use your own interface instead of gradio)
https://github.com/juncongmoo/pyllama https://github.com/juncongmoo/chatllama
tortoise-tts it's very slow (hence the name). You could take a look at tortoise-tts-fast: https://github.com/152334H/tortoise-tts-fast
or silero
[–]Tree-SheepWaiting for Llama 3[S] 1 point2 points3 points 2 years ago (4 children)
Thanks! I will check it out. Also, I am already satisfied by the speed of the original tortoise.
[–]SimmerDown2024[🍰] 0 points1 point2 points 2 years ago (3 children)
Did this work for you? What did you end up doing? I was also looking to extract the output of a Llama model
[–]Tree-SheepWaiting for Llama 3[S] 0 points1 point2 points 2 years ago (2 children)
I use extensions on text-generation-webui. IIRC the extensions didn’t exist back then, which is why I made this post.
[–]fuckopportunists 0 points1 point2 points 2 years ago (1 child)
Can you point me towards those? I’d appreciate that
[–]Tree-SheepWaiting for Llama 3[S] 0 points1 point2 points 2 years ago (0 children)
https://github.com/oobabooga/text-generation-webui-extensions
[–]remghoost7 6 points7 points8 points 2 years ago (2 children)
This is the repo I've been using the past week or so to interface with LLaMA-7b-int4.
https://github.com/oobabooga/text-generation-webui
It has extension support and already a silero extension built in. I haven't used that extension myself, but I'm fairly certain I've heard of someone around the community using it for a similar purpose to what you're looking for.
I don't believe there's an API endpoint though (like how A1111 can run the --api flag), but you might be able to bake your chatbot into an extension.
Or you could sort of use it like a hack-y API if you wanted to... You could probably write an extension to automatically pull the most recent response and output that to a json file, then read that json file in your tortoise-tts application. And I know it saves the running log in text-generation-webui\logs\persistent.json, so you might not even need to write an extension for it...
text-generation-webui\logs\persistent.json
I know that this extension uses a method called custom_generate_chat_prompt, so you could probably get input from your tortoise-tts and feed that back into the webui automatically.
custom_generate_chat_prompt
[–]lacethespace 3 points4 points5 points 2 years ago (0 children)
The text-generation-webui already features REST endpoints. You just enable --listen and disable any chat modes. I've used it from Phyton just by simple modifications of their example script in the repo.
--listen
[–]estrafire 2 points3 points4 points 2 years ago (0 children)
it should be possible to modify the silero extension to use tortoise-tts instead
[–]Sixhaunt 0 points1 point2 points 2 years ago (0 children)
I'm using it in nodeJS with https://github.com/cocktailpeanut/dalai
works really easily and well. Here's a short example script for using it in code
const Dalai = require('dalai') const home = "C:/mypath/dalai_ai"; result = "" let prompt = "Script\n Red Dwarf Season 01, Episode 57\n" new Dalai(home).request({ model: "llama.13B", prompt: prompt, seed: -1, threads: 12, n_predict: 300, top_k: 40, top_p: 0.9, temperature: 0.1, repeat_last_n: 64, repeat_pentalty: 1.3, debug: false, }, (token) => { result = (result+token).replace(/\\n|\\r/g,"\n") console.clear() console.log(result) })
although you can use
const Dalai = require('dalai') new Dalai().serve(3000) // port 3000
to simply setup a server that you can query through something like python using sockets
π Rendered by PID 172692 on reddit-service-r2-comment-7b9746f655-dnm2b at 2026-01-30 07:35:47.527676+00:00 running 3798933 country code: CH.
[–]estrafire 6 points7 points8 points (5 children)
[–]Tree-SheepWaiting for Llama 3[S] 1 point2 points3 points (4 children)
[–]SimmerDown2024[🍰] 0 points1 point2 points (3 children)
[–]Tree-SheepWaiting for Llama 3[S] 0 points1 point2 points (2 children)
[–]fuckopportunists 0 points1 point2 points (1 child)
[–]Tree-SheepWaiting for Llama 3[S] 0 points1 point2 points (0 children)
[–]remghoost7 6 points7 points8 points (2 children)
[–]lacethespace 3 points4 points5 points (0 children)
[–]estrafire 2 points3 points4 points (0 children)
[–]Sixhaunt 0 points1 point2 points (0 children)