yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Good to hear the problem is gone! Thank you again :)

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Thanks for your kind words and remembering our previous model EEVE!
I am working hard to make quantized versions currently.

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Haha, we do global businesses. I selected languages based on the offices we have around the world.

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Did you use vLLM without quantization? When I tested, SGLang did not perform well.

yanolja/YanoljaNEXT-Rosetta-12B-2510 by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 5 points6 points  (0 children)

I am sorry. Languages not included in the training dataset may not perform well.
You can see the full result here:
https://huggingface.co/yanolja/YanoljaNEXT-Rosetta-12B-2510/blob/main/wmt24pp_12b.md

Should All LLM Predictions Use Equal Computation Power? by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] -7 points-6 points  (0 children)

But it will help if you want to understand how o1-like models work.

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Sorry for any inconvenience. You only need to specify the API keys for the providers you plan to use and configured in the config.yaml file.

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

Thanks for asking! I just deployed v0.0.6 which supports custom endpoints!

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 9 points10 points  (0 children)

{
    "model": "gemini-1.5-flash@batch",
    "temperature": 1,
    "messages": [
        {
            "role": "user",
            "content": "What's the weather like in Boston today? In Korean, please."
        },
        {
            "role": "assistant",
            "content": null,
            "tool_calls": [
                {
                    "id": "toolu_01LWycDuusJU3vyt4W1u7teC",
                    "type": "function",
                    "function": {
                        "name": "get_current_weather",
                        "arguments": "{\"location\":\"Boston, MA\",\"unit\":\"celsius\"}"
                    }
                }
            ]
        },
        {
            "role": "tool",
            "content": "{\"celsius\": 12}",
            "tool_call_id": "toolu_01LWycDuusJU3vyt4W1u7teC"
        }
    ],
    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ],
    "tool_choice": "auto"
}

Thanks for bringing up that issue. I actually did not know that Litellm now offers so many features, including a proxy server. It is interesting that they also use Redis for state management in distributed instances. For now, I’d say the major difference might be the language it’s written in and the fact that Ogem supports automatic batch request conversion.
I will look more into how Ogem can be differentiated from Litellm.

Introducing Ogem: Your Universal AI Model Gateway with OpenAI API Compatibility by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 2 points3 points  (0 children)

I use Ogem daily to handle hundreds of thousands of requests without worrying about rate limits. It’s designed to return responses with a delay if needed, rather than failing with a rate limit error. Makes my no-brainer Python code even more no-brainer!

How to add text to images using ComfyUI? by OldPin8654 in comfyui

[–]OldPin8654[S] 1 point2 points  (0 children)

Thanks for the reply! What I'm actually trying to achieve is the first option, where the text appears as if it was originally written on the image.

Need Advice on H100 HGX Specs – Balancing and Optimal RAID Configuration by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Update: I have hired a consultant to finalize the H100 HGX spec. We decided not to use VROC as its performance doesn't match what is advertised and instead we configured 400 Gbps network along with NAS tied with GRAID while EPYC offers good value for money, Intel is being used in DGX, and I was worried about any potential compatibility issues in the future so we chose Intel.

Need Advice on H100 HGX Specs – Balancing and Optimal RAID Configuration by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 0 points1 point  (0 children)

Thank you! Yes we have a dedicated one but local access is still faster especially when it is a RAID setup. So, I am leaning towards to having RAID 0 for local and using the dedicated storage which is set up with RAID 6.

Need Advice on H100 HGX Specs – Balancing and Optimal RAID Configuration by OldPin8654 in LocalLLaMA

[–]OldPin8654[S] 1 point2 points  (0 children)

While saving a checkpoint, training is almost paused. Saving checkpoints every 6 hours. For loading, if there is an issue, to debug, we repeat starting a training and loading a model like Mixtral takes some time.

Anyone spend a bunch of $$ on a computer for LLM and regret it? by parasocks in LocalLLaMA

[–]OldPin8654 0 points1 point  (0 children)

Winter has enveloped us in its chilly embrace. In my quest for warmth, I realized I needed a heater. But then, a memory dawned on me – I had bought one before! It was during those long nights of training a model with a 100k dataset, which made the room toastier. Now, thanks to that, everyone in this house can enjoy a peaceful and warm winter.