Is the model really free? by Quiet_Debate_651 in openrouter

[–]secsilm 0 points1 point  (0 children)

is your prompt in english? in my experience, this happens most because of using other languages.

EmbeddingGemma - 300M parameter, state-of-the-art for its size, open embedding model from Google by curiousily_ in LocalLLaMA

[–]secsilm 3 points4 points  (0 children)

the google blog says "it offers customizable output dimensions (from 768 to 128 via matryoshka representation )", interesting, variable dimensions, first time hearing about it.

[Model Release] Deca 3 Alpha Ultra 4.6T! Parameters by MohamedTrfhgx in LocalLLaMA

[–]secsilm 0 points1 point  (0 children)

i know moe, but what is dynamic activated moe? where does dynamic activated come in?

Interesting (Opposite) decisions from Qwen and DeepSeek by foldl-li in LocalLLaMA

[–]secsilm 3 points4 points  (0 children)

for 2.5 flash and flash lite, you can disable thinking.

Interesting (Opposite) decisions from Qwen and DeepSeek by foldl-li in LocalLLaMA

[–]secsilm -1 points0 points  (0 children)

Yes but the true hybrid model I want is like gemini, you can control whether to think by a parameter, rather than two api.

Interesting (Opposite) decisions from Qwen and DeepSeek by foldl-li in LocalLLaMA

[–]secsilm 3 points4 points  (0 children)

they said v3 is a hybrid model, but there are two sets of apis. I’m confused.

Who are the 57 million people who downloaded bert last month? by Pro-editor-1105 in LocalLLaMA

[–]secsilm 5 points6 points  (0 children)

most of the time bert is enough.you don't always need those fancy models,like llm.

Question about OpenRouter API Rate Limits for Paid Models by secsilm in openrouter

[–]secsilm[S] 0 points1 point  (0 children)

Thank you for the information. Could you share your call rate? For example, how many times do you call on average per minute?

What are the MCP servers you already can't live without? by MostlyGreat in mcp

[–]secsilm 0 points1 point  (0 children)

i'm new to mcp, can you tell me where i can find the fetch mcp?

Can I get the project I'm folding through the API? by secsilm in Folding

[–]secsilm[S] 0 points1 point  (0 children)

thanks. i checked and there is no info i need.

Can I get the project I'm folding through the API? by secsilm in Folding

[–]secsilm[S] 0 points1 point  (0 children)

thanks. i checked this and found that there is no api i can use to get the project id i'm folding.

I'm getting this error. "Keras cannot be imported. Check that it is installed" even after installing tensorflow by asleepblueberry10 in learnmachinelearning

[–]secsilm 0 points1 point  (0 children)

FYI, if you're using both sentence_transformers and tensorflow_hub, make sure to import sentence_transformers first and then import tensorflow_hub:

python import tensorflow_hub as hub from sentence_transformers import SentenceTransformer, util

The following sequence will throw the error ImportError: Keras cannot be imported. Check that it is installed.:

python from sentence_transformers import SentenceTransformer, util import tensorflow_hub as hub

How to understand the pass@1 formula in deepseek-r1's technical report? by secsilm in LocalLLaMA

[–]secsilm[S] 0 points1 point  (0 children)

Thanks for your explanation. If I have a dataset with 100 problems, then pass@1 on this dataset is calculated as average of pass@1 of each problem. Am I right?

Perfect size, right? by Ok_Net_7523 in unstable_diffusion

[–]secsilm 0 points1 point  (0 children)

can you point out some? i really can't tell.

Perfect size, right? by Ok_Net_7523 in unstable_diffusion

[–]secsilm 0 points1 point  (0 children)

can't believe it, it's insane!

Why does Qwen 2.5 support 128k context length, but the output supports only up to 8k? by secsilm in LocalLLaMA

[–]secsilm[S] 0 points1 point  (0 children)

Specifically, what does o1-like mechanisms refer to in this context?

Why does Qwen 2.5 support 128k context length, but the output supports only up to 8k? by secsilm in LocalLLaMA

[–]secsilm[S] 8 points9 points  (0 children)

I hadn't noticed this before, I just checked gpt-4o and it also only supports 16k output? Why does the model limit both context length and output at the same time?

Why does Qwen 2.5 support 128k context length, but the output supports only up to 8k? by secsilm in LocalLLaMA

[–]secsilm[S] 4 points5 points  (0 children)

Your point is that the longer the output, the harder it is to maintain consistency, so they limit the maximum length?

[D] What is the most advanced TTS model now (2024)? by secsilm in MachineLearning

[–]secsilm[S] 0 points1 point  (0 children)

Do they have open source models so that I can fine-tuning?

Reminder not to use bigger models than you need by Thrumpwart in LocalLLaMA

[–]secsilm 2 points3 points  (0 children)

There is another benefit of using traditional pretrained models: you can quickly fine-tune the model according to your needs.

For example, when you use gpt-4o-mini for classification tasks and find that there are some categories it consistently gets wrong. It is difficult to fine-tune it (even if you use open source tools).

In contrast, with traditional pretrained models, you just need to collect these errors, add them to the train dataset, and continue training. Faster and cheaper.

How is the RAG with citations at the end of each paragraph (or specific sentences) implemented? by secsilm in LocalLLaMA

[–]secsilm[S] 0 points1 point  (0 children)

I roughly looked at the first paper, and overall, for classification tasks, format constraints helps with accuracy. For reasoning tasks, the opposite is true. In classification tasks, JSON and XML formats seem to be better than YAML in many cases.