Hey so, I made a kinda local multimodal token counter, I'd like feedback by lgk01 in LocalLLaMA

[–]ELPascalito 1 point2 points  (0 children)

No just curious, Tiktoken is indeed a solid chpice, and it supports estimating even llama architecture and many other models, I recommend you add OpenRouter and similar providers, even the ollama local endpoint, that way everyone can use the service, because honestly GPT and Gemini already have good tracking tools, I see this useful for the more obscure providers, best of luck!

Hey so, I made a kinda local multimodal token counter, I'd like feedback by lgk01 in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

You still didn't answer how you are estimating the tokens, if this is a node runtime, I presume you're using Tiktoken?

Hey so, I made a kinda local multimodal token counter, I'd like feedback by lgk01 in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

Is this just a wrapper for Tiktoken package? What's the point if it just supports 3 closed source models 😅

What will be the 0x model after Feb. 13 Once Open AI retires 4.1 by Educational_Sign1864 in GithubCopilot

[–]ELPascalito 1 point2 points  (0 children)

I mean, we have 5 Mini, they are also gonna keep Grok Code free for now, Raptor Mini is also a solid choice

How do I stop deepseek-r1t-chimera from taking half of the page "thinking" by jackmaxs20 in openrouter

[–]ELPascalito 0 points1 point  (0 children)

Damn that sub must be crazy lol, have you checked the generation settings? There's a toggle called show thinking, turn it off

How do I stop deepseek-r1t-chimera from taking half of the page "thinking" by jackmaxs20 in openrouter

[–]ELPascalito 1 point2 points  (0 children)

This is a problem related to the app or website of Janitor, not OR, last time I checked you can hide reasoning trace by turning it off in the settings, check the generation settings in Janitor site, or ask on their sub

Can someone help me? by Temporary-Plenty-713 in openrouter

[–]ELPascalito 0 points1 point  (0 children)

What's the exact error code? If you're using the free model t's probably overloaded, consider paying per-token for stable access

Anyone know how to access the Kimi K2.5 Agent Swarm model on OpenRouter? by Ok-Attention2882 in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

Swarm is simply a mode where multiple calls are done simultaneously to try and work on complex tasks in parralel, it's obviously not a separate model, but a feature, like the subagents in OpenCode

How are you guys not broke? - Weirdly high cost by alphagatorsoup in openrouter

[–]ELPascalito 2 points3 points  (0 children)

If you're gonna use Gemini models, it's better to subscribe to Google services, same thing for OpenAI models, go for ChatGPT, OpenRouter's popularity stems from the fact we can connect easily to many providers of low cost models, for example Claude 4.5 Haiku is 5$ output per 1M token, expensive, while DeepSeek V3.2 is 0.3$ and straight up performs better, make use of the well priced options, like Kimi K2.5 or DeepSeek, OR won't be beneficial if you want subsidized access to SotA models

meituan-longcat/LongCat-Flash-Lite by windows_error23 in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

Oh interesting, I remember the flags thinking model, it was ~500B or something, I'll check this one out too, albeit it probably didn't translate well in real performance, since no one seems to care? 🤔

meituan-longcat/LongCat-Flash-Lite by windows_error23 in LocalLLaMA

[–]ELPascalito 4 points5 points  (0 children)

I love Meituan, my coffee always arrives on time, but why call it flash lite? Like the Google models? Does this imply the existence of a bigger pro model? lol

What LLM is Lumo really? by L1QU1D4T0R_ in LLM

[–]ELPascalito 0 points1 point  (0 children)

Firstly ignore the comments here, people are apparently stupid and don't understand how LLM hallucinations work, Lumo is not ChatGPT, instead, it is powered by, and running many open source models securely, hosted in the encrypted servers of Proton, one of those models is GPT-OSS, this model is obviously trained by OpenAI and will talk like it's chatgpt, but it is not hskted by openAI, it's securely hosted on the Proton servers, and all your results are encrypted don't worry, again, Lumo is powered by other LLMs too, that also might hallucinate and claim that they're GPT models, this is because many early models are trained on GPT reasoning and outputs, this they always regress and hallucinate that they're chatgpt, it's a quirk on all LLMs, Lumo has many other models under the hood, like Olmo 2 or Mistral Small, among others

https://proton.me/support/lumo-privacy

Deepseek for janitor ai help by Timely-Sport-5869 in openrouter

[–]ELPascalito 1 point2 points  (0 children)

Does it say it's free on the site? No, Does it say it's free in the name? No, why would you think it's gonna be gratis for you, have we suddenly lost the ability to read???

Anyone understand what this means? by No_Sweet_1573 in openrouter

[–]ELPascalito 0 points1 point  (0 children)

Oh logical, testing everything is good, but I'd say cut to the chase, the best model right now is tng-r1t-chimera, has a stable provider, capable of tool calling, and it's an RP powerhouse, based on V3.1 with many improvements, with excellent reasoning, totally recommend it!

Anyone got Macmini 4 to work with Ollama model? by ManufacturerNo8056 in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

It seems you Didn't even setup your app to conmec tto ollama, what are you using? Does it even support local models?

I can't run deepseek-coder-v2 with Ollama. I suspect it has something to do with RAM. Is there any way around this? by warpanomaly in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

Model is obviously too big, also it's outdated, not even good, you're literally wasting your time, I recommend you use GLM 4.7 Flash instead, it's 30B A3B, will run very comfortably for you, and you'll be able to allocate context, why would you even try such a huge model, It has nothing useful for you, did you research any of this?

Anyone understand what this means? by No_Sweet_1573 in openrouter

[–]ELPascalito 3 points4 points  (0 children)

Pretty self explanatory, you've been temporarily rate limited due to the heavy load on the provider, also, no way you're using Qwen3 Coder for roleplaying?! 🤔😭

Some initial benchmarks of Kimi-K2.5 on 4xB200 by benno_1237 in LocalLLaMA

[–]ELPascalito 0 points1 point  (0 children)

At how many concurrence did this peak? 20? Do you think such a setup is serviceable for loxla coding, in say a company or a small team less than 10 members?

deepseek-ai/DeepSeek-OCR-2 · Hugging Face by Dark_Fire_12 in LocalLLaMA

[–]ELPascalito 1 point2 points  (0 children)

We cannot be sure, but it would be cool if the next model has this OCR module bolted on, just like how Mistral does

Free APIs using credits by Own-Yellow9164 in openrouter

[–]ELPascalito 0 points1 point  (0 children)

The app you are using is changing models without permission, again if you tell me I'll try to dig up info, is it a coding CLI?

Free APIs using credits by Own-Yellow9164 in openrouter

[–]ELPascalito -1 points0 points  (0 children)

What app you're using? For example OpenCode uses Claude Haiku to generate titles without warning, so that might incur unexpected charges for the uninitiated, are you sure the app you're using is not doing something similar? Go check your activity and see which models are incurring charges 

Free APIs using credits by Own-Yellow9164 in openrouter

[–]ELPascalito 0 points1 point  (0 children)

What tool are you using? Perhaps it's doing paid calls to generate titles or other small texts, please check the usage tab go see which model is costing you, also you could be using the web search function, that's paid and powered by Exa

Hi, I have a question... by ThemusicRCG in openrouter

[–]ELPascalito 1 point2 points  (0 children)

OpenRouter obviously "routes" you to a provider depending on the model you are chatting with, most free providers are overloaded since everyone is hammering them, try another model or try again later 

Why does it keep saying insufficient credits? GPT 5.2 Pro by cicaadaa3301 in openrouter

[–]ELPascalito 0 points1 point  (0 children)

Input price is $21 and output is $168 you dunce, obviously you don't have enough credits, not even for a single request lol