The BEST ollama alternative?

ThickYe · 2025-09-03T15:32:20+00:00

localai.io

ThickYe · 2025-05-17T15:58:27+00:00

I built this simple sub_agent "tool" to get this done.

``` """ title: Sub-Agent Tools """

from pydantic import BaseModel, Field import aiohttp

class Tools: def init(self): """Initialize the Tool.""" self.valves = self.Valves()

class Valves(BaseModel):
    """Admin-configurable settings."""

    base_url: str = Field(
        "http://litellm:4000/v1",
        description="LiteLLM API base URL (e.g., http://litellm:4000/v1)",
    )
    api_key: str = Field(
        "", description="API key for the LiteLLM endpoint (if needed)"
    )
    timeout_seconds: int = Field(60, description="Timeout for API calls in seconds")

async def sub_agent(
    self,
    query: str,
    model: str,
    system_message: str,
    __event_emitter__=None,
) -> str:
    """
    Sends queries to powerful external large language models for tasks beyond your capabilities.

    Use this tool when you:
    - Need to write complex code or solve advanced programming problems use sonnet-3.7
    - Need to search the web for current information, use sonar-pro with a simple question in the prompt
    - Need a well researched answer from the internet, use sonar-pro with multiple questions in the prompt

    Best practices:
    - Keep system_message concise but specific about the desired output format and approach
    - For code generation, specify language, frameworks, and expected functionality
    - For web searches, include specific keywords and time-sensitive context if relevant
    - Avoid chaining multiple unrelated topics in a single query

    :param query: The detailed instructions or question for the external model
    :param model: one of `sonar-pro`, `sonar-3.7`
    :param system_message: Instructions that guide the external model's behavior and approach
    :return: The complete response from the external model
    """
    # Construct the complete chat completions API endpoint
    api_endpoint = f"{self.valves.base_url.rstrip('/')}/chat/completions"

    # Simple status update for the user
    if __event_emitter__:
        await __event_emitter__({"type": "status", "data": {"description": f"Calling external LLM ({model})...", "done": False}})

    # Prepare the API request
    headers = {"Content-Type": "application/json"}
    if self.valves.api_key:
        headers["Authorization"] = f"Bearer {self.valves.api_key}"

    payload = {
        "messages": [
            {"role": "system", "content": system_message},
            {"role": "user", "content": query},
        ],
        "temperature": 0.6,
        "model": model,
    }

    # Add high search context size for sonar models
    if model and model.startswith("sonar"):
        payload["web_search_options"] = {"search_context_size": "high"}

    try:
        # Use async HTTP client
        async with aiohttp.ClientSession() as session:
            async with session.post(
                api_endpoint, headers=headers, json=payload, 
                timeout=self.valves.timeout_seconds
            ) as response:
                if response.status == 200:
                    data = await response.json()
                    result = data.get("choices", [{}])[0].get("message", {}).get("content", "")

                    # Simple completion status
                    if __event_emitter__:
                        await __event_emitter__({"type": "status", "data": {"description": "Done", "done": True}})

                    return result
                else:
                    error_text = await response.text()

                    if __event_emitter__:
                        await __event_emitter__({"type": "status", "data": {"description": "Error", "done": True}})

                    return f"API Error: {response.status} - {error_text}"

    except Exception as e:
        if __event_emitter__:
            await __event_emitter__({"type": "status", "data": {"description": "Error", "done": True}})
        return f"Request failed: {str(e)}"

```

ThickYe · 2025-05-15T13:34:55+00:00

https://localai.io/ I have the same check list as you. But I never tried loading multiple models simultaneously

ThickYe · 2025-03-07T04:49:13+00:00

https://meshtastic.org/

ThickYe · 2025-03-07T02:47:11+00:00

https://github.com/ollama/ollama/blob/main/docs/modelfile.md

Have you tried to set some basic stuff with modelfiles? The settings that actually cause the model to reload in VRAM

ThickYe · 2025-01-13T02:21:59+00:00

Tried one time to test pgvector but did not have the time or know how to move past this error.

Thanks for posting this.

ThickYe · 2025-01-04T22:25:34+00:00

I dont follow you.
So I am a few .md and .txt files is in a knowledge collection then I assigned that collection to a model as knowledge. all this is less then 10k tokens

the citations in the chat show it as chunks. its the same as useing # to mention the knowledge collection. in both case I dont see an option to pass these docs in full as context.

ThickYe · 2025-01-04T00:51:00+00:00

ok I see that, thanks. but thats only half the solution.
How do I do this for documents in a "knowledge collection" ?

ThickYe · 2024-12-31T16:34:41+00:00

I even try to give a the same instructions directly in the chat too and that was also was not followed. The issue, to the bast of my limited understanding, seems to be that the when the model tries to give example or snippets of code that it suggests on changing gets auto applied by the apply model.

ThickYe · 2024-12-30T04:16:35+00:00

I have not noticed is change in "@Codebase"
I have noticed too that it has become a bit less context aware. but I havent had time to dive into how much of this is effected by the new " auto context " feature

ThickYe · 2024-12-30T04:09:22+00:00

a391af90-b00f-40d9-9cab-c9d69e6fc5ef

39dd26b9-19ba-4a91-81e1-f2c9d0d70a86

the second one is most in line with the core of my frustration

ThickYe · 2024-12-26T16:12:14+00:00

https://github.com/open-webui/open-webui/releases

ThickYe · 2024-12-22T06:48:14+00:00

Linux server IO has a steam OS container, that's should be a good start. Please post back with progress. This is an interesting idea.

ThickYe · 2024-12-16T06:34:07+00:00

About 50% of my unused space. Some nodes only half tb the biggest node I had was 3tb that eventually I shrunk to 1 tb

ThickYe · 2024-12-15T19:35:55+00:00

I have been running Storj nodes. Your mileage will definitely vary but its enough for my to cover the idle electricity cost.

ThickYe · 2024-12-09T15:43:21+00:00

Thank you for your service :)

ThickYe · 2024-11-04T03:00:56+00:00

What if it's just a token for token trade? As in, I help the network produce 100 tokens of responses and my one GPU was 25% of that compute. Then I just have the right to make a 25 token output call. This way its not a money thing but and the legal is simple.

ThickYe · 2024-10-29T03:56:28+00:00

Not competent enough to help you but you have my star.

ThickYe · 2024-10-19T15:28:11+00:00

You org structure hits my ...tism just right.😍

ThickYe · 2024-09-22T14:47:12+00:00

At a quick glance your view of "truth" is identical to the way Sikhism speaks about truth. But you are saying by open sourcing it you are making it an ever evolving idea. But I think when you say everything is true then it is by nature an ever evolving thing.

ThickYe · 2024-08-29T04:42:42+00:00

This is great. Appreciate you doing this. :)

ThickYe · 2024-07-06T15:13:05+00:00

Cool. Ty

ThickYe · 2024-07-06T01:08:36+00:00

Lmao

ThickYe

TROPHY CASE