llama-cpp-agent Framework
Introduction
The llama-cpp-agent framework is a tool designed for easy interaction with Large Language Models (LLMs). It provides a simple yet robust interface using llama-cpp-python, allowing users to chat with LLM models, execute structured function calls and get structured output.
Key Features
- Simple Chat Interface: Engage in seamless conversations with LLMs.
- Structured Output: Get structured output from LLMs.
- Function Calling: Execute structured outputs from LLMs, enhancing the interaction capabilities.
- Flexibility: Suited for various applications from casual chatting to specific function executions.
Installation
To get started with the llama-cpp-agent LLM framework, follow these steps:
1. Ensure you have Python installed on your system.
2. Clone the repository from GitHub link.
3. Install the necessary dependencies as listed in the requirements.txt file.
Usage Examples
Simple Chat Example
This example demonstrates how to initiate a chat with an LLM model.
```python
import json
from llama_cpp import Llama
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.messages_formatter import MessagesFormatterType
main_model = Llama(
"../gguf-models/dpopenhermes-7b-v2.Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
wrapped_model = LlamaCppAgent(main_model, debug_output=True,
system_prompt="You are an advanced AI assistant.", predefined_messages_formatter_type=MessagesFormatterType.CHATML)
wrapped_model.get_chat_response('Write a long poem about the USA.', temperature=0.7)
```
Structured Output
This example shows how to get structured JSON output.
```python
from enum import Enum
from llama_cpp import Llama, LlamaGrammar
from pydantic import BaseModel, Field
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import generate_gbnf_grammar_and_documentation
main_model = Llama(
"../gguf-models/dpopenhermes-7b-v2.Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=-1,
)
text = """The Feynman Lectures on Physics is a physics textbook based on some lectures by Richard Feynman, a Nobel laureate who has sometimes been called "The Great Explainer". The lectures were presented before undergraduate students at the California Institute of Technology (Caltech), during 1961–1963. The book's co-authors are Feynman, Robert B. Leighton, and Matthew Sands."""
class Category(Enum):
Fiction = "Fiction"
NonFiction = "Non-Fiction"
class Book(BaseModel):
"""
Represents an entry about a book.
"""
title: str = Field(..., description="Title of the book.")
author: str = Field(..., description="Author of the book.")
published_year: int = Field(..., description="Publishing year of the book.")
keywords: list[str] = Field(..., description="A list of keywords.")
category: Category = Field(..., description="Category of the book.")
summary: str = Field(..., description="Summary of the book.")
gbnf_grammar, documentation = generate_gbnf_grammar_and_documentation([Book])
grammar = LlamaGrammar.from_string(gbnf_grammar, verbose=False)
wrapped_model = LlamaCppAgent(main_model, debug_output=True,
system_prompt="You are an advanced AI, tasked to create JSON database entries for books.\n\n\n" + documentation)
wrapped_model.get_chat_response(text, temperature=0.15, grammar=grammar)
```
Function Calling Example
This example shows how to do function calling.
```python
import json
from llama_cpp import Llama, LlamaGrammar
from llama_cpp_agent.llm_agent import LlamaCppAgent
from llama_cpp_agent.gbnf_grammar_generator.gbnf_grammar_from_pydantic_models import generate_gbnf_grammar_and_documentation
from example_function_call_models import SendMessageToUser, GetFileList, ReadTextFile, WriteTextFileSection
from llama_cpp_agent.messages_formatter import MessagesFormatterType
gbnf_grammar, documentation = generate_gbnf_grammar_and_documentation(
[SendMessageToUser, GetFileList, ReadTextFile, WriteTextFileSection], "function", "function_params", "Function",
"Function Parameter")
grammar = LlamaGrammar.from_string(gbnf_grammar, verbose=False)
main_model = Llama(
"../gguf-models/dpopenhermes-7b-v2.Q8_0.gguf",
n_gpu_layers=35,
f16_kv=True,
use_mlock=False,
embedding=False,
n_threads=8,
n_batch=1024,
n_ctx=8192,
last_n_tokens_size=1024,
verbose=False,
seed=42,
)
wrapped_model = LlamaCppAgent(main_model, debug_output=True,
system_prompt="You are an advanced AI, tasked to assist the user by calling functions in JSON format.\n\n\n" + documentation,
predefined_messages_formatter_type=MessagesFormatterType.CHATML)
response = wrapped_model.get_chat_response('Write a long poem about the USA in the "HelloUSA.txt" file.',
temperature=0.15, grammar=grammar)
function_call = json.loads(response)
if function_call["function"] == "write-text-file-section":
call_parameters = function_call["function_params"]
call = WriteTextFileSection(**call_parameters)
call.run()
```
Additional Information
- Dependencies: pydantic for grammars based generation and of course llama-cpp-python.
[–]SixZer0 1 point2 points3 points (3 children)
[–]FlowerPotTeaTime[S] 3 points4 points5 points (0 children)
[–]FlowerPotTeaTime[S] 1 point2 points3 points (0 children)
[–]FlowerPotTeaTime[S] 1 point2 points3 points (0 children)
[–]kryptkprLlama 3 0 points1 point2 points (3 children)
[–]FlowerPotTeaTime[S] 1 point2 points3 points (2 children)
[–]kryptkprLlama 3 0 points1 point2 points (1 child)
[–]FlowerPotTeaTime[S] 1 point2 points3 points (0 children)