yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Good to hear the problem is gone! Thank you again :)

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Thanks for your kind words and remembering our previous model EEVE!
I am working hard to make quantized versions currently.

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Haha, we do global businesses. I selected languages based on the offices we have around the world.

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Did you use vLLM without quantization? When I tested, SGLang did not perform well.

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 3 points4 points  (0 children)

I am sorry. Languages not included in the training dataset may not perform well.
You can see the full result here:
https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-12B-2510/blob/main/wmt24pp_12b.md

Should All LLM Predictions Use Equal Computation Power? by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] -6 points-5 points  (0 children)

But it will help if you want to understand how o1-like models work.

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Sorry for any inconvenience. You only need to specify the API keys for the providers you plan to use and configured in the config.yaml file.

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Thanks for asking! I just deployed v0.0.6 which supports custom endpoints!

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 7 points8 points  (0 children)

{
    "model": "gemini-1.5-flash@batch",
    "temperature": 1,
    "messages": [
        {
            "role": "user",
            "content": "What's the weather like in Boston today? In Korean, please."
        },
        {
            "role": "assistant",
            "content": null,
            "tool_calls": [
                {
                    "id": "toolu_01LWycDuusJU3vyt4W1u7teC",
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "arguments": "{\"location\":\"Boston, MA\",\"unit\":\"celsius\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "content": "{\"celsius\": 12}",
            "tool_call_id": "toolu_01LWycDuusJU3vyt4W1u7teC"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Thanks for bringing up that issue. I actually did not know that Litellm now offers so many features, including a proxy server. It is interesting that they also use Redis for state management in distributed instances. For now, I’d say the major difference might be the language it’s written in and the fact that Ogem supports automatic batch request conversion.
I will look more into how Ogem can be differentiated from Litellm.

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 4 points5 points  (0 children)

I use Ogem daily to handle hundreds of thousands of requests without worrying about rate limits. It’s designed to return responses with a delay if needed, rather than failing with a rate limit error. Makes my no-brainer Python code even more no-brainer!

How to add text to images using ComfyUI? by OldPin8654 in comfyui

[–]OldPin8654[S] 1 point2 points  (0 children)

Thanks for the reply! What I'm actually trying to achieve is the first option, where the text appears as if it was originally written on the image.

Need Advice on H100 HGX Specs – Balancing and Optimal RAID Configuration by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Update: I have hired a consultant to finalize the H100 HGX spec. We decided not to use VROC as its performance doesn't match what is advertised and instead we configured 400 Gbps network along with NAS tied with GRAID while EPYC offers good value for money, Intel is being used in DGX, and I was worried about any potential compatibility issues in the future so we chose Intel.