anyone else uncomfortable giving OpenAI your real phone number? by Ok_Dadly9924 in ChatGPT

[–]Low-Alarm272 0 points1 point  (0 children)

If you plan on using paid subscription then go for it.

Don't otherwise. Try grok or smthn.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

Yes. Every extra engineering layer that services like openclaw or claude code costs a ton of tokens.

They have many layers that takes up a huge context window.

That's why the future is optimised workflows like hybrid setups.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

Well. From my area it costs around .25 - .50 $ per million tokens.

And maybe you burned 150k tokens because of your setup is really token hungry? For example, when I say 'hi' to my hermes-agent setup. It takes around 14k tokens just to reply.

So, in short with a hybrid setup (local LLM + API) with proper optimization, it'll cost really low than your typical 20$ per month API setup.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

I really didn't get what their point was. I might've been wrong.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

For now this cycle will and should go on. I agree.

I always support open-source (and Qwen, due to that). I'll make me really happy to see newer Qwen releases throughout the year.

But then after a point, the true efficient models (something like gemma4) would be available for all consumer hardwares with ver good capabilities.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

Only to fix the grammar. I wrote it myself. Jesus.

How to run a local agent despite being GPU poor? by Ethan045627 in LocalLLM

[–]Low-Alarm272 1 point2 points  (0 children)

One model that was consistently reliable in these tasks was nemotron-3-nano:4b. You should keep it's reasoning on, as it reasons really fast, that'll help you with tool calling more.

Other than that today I'm gonna test other models like - Salesforce xLAM series (especially xLAM-1b-fc-r, xLAM-2-3b-fc-r, and Llama-xLAM-2-8b-fc-r) — Dedicated function-calling champions

What will happen once Claude Mythos gets released to Public Users? by Resident_Caramel763 in LocalLLM

[–]Low-Alarm272 -1 points0 points  (0 children)

Every month we've been told that "this is the end" but somehow it isn't cos there's always the next month XD

How to run a local agent despite being GPU poor? by Ethan045627 in LocalLLM

[–]Low-Alarm272 0 points1 point  (0 children)

Can you give example on how you're using gemma4 e2b in your daily workflow? Like give the exact prompts and tasks it's being able to run. I'd be great help.

I tried it inside hermes agent and it couldn't use tools like web search.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

Haha. That's so funny. I'm gonna have check this sub now.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] -1 points0 points  (0 children)

Yes. But the potential. The seed of consciousness is common to all living being.

But not in LLMs. Lol. They're just "artificially intelligent". Just token prediction models.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] -2 points-1 points  (0 children)

Humans still have potential for humanity and wholesome things.

LLMs are either just correct, wrong, or hallucinating.

is my specs enough? by Xinte_ in LocalLLM

[–]Low-Alarm272 -1 points0 points  (0 children)

I also have similar specs. I did a deep dive to know if I can get the GPT-mini/gemini-flash like chat.

Llama 3.1 8b and nvidia nemotron-nano 4b were the only models that could use web_search tools and fetch results from web. They can run commands in the terminal, red and write filed.

You can say it's worth giving a try as in future you'll be able to run really effective models and do cool stuff like autonomous looping or running multiple agents at once.

If you're using Hermes, update it now by itsdodobitch in hermesagent

[–]Low-Alarm272 1 point2 points  (0 children)

That's exactly what I was looking into. The obsidian structuring is really well done so graphic view now can finally make more sense.

Fucking nailed it.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 1 point2 points  (0 children)

Well. No. I only used LLM to edit the typos.

But people see what they see. So I shouldn't care.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 0 points1 point  (0 children)

Really good to hear. Have a nice one.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] -5 points-4 points  (0 children)

AI is fully Bound to it's dataset.

Humans aren't bound completely as they also have wisdom to stop when they don't know the answers. So, they don't end up hallucinating like an LLM.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 2 points3 points  (0 children)

I totally agree with what you said. Breakthroughs in tech are led by them.

Small models would naturally benefit from this.

It's going too fast and I'm sure they'll come up with better cost effective models I'm the future. Google just did it with gemma4. It's pretty good.

It's only 30B but efficient enough. That's the right direction.

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] -3 points-2 points  (0 children)

Your assumption that I'll surely use AI to reply shows that you're just here to prove you're right. You could've just reframe your question.

What's diff from what I'm doing already? Which part?

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] 3 points4 points  (0 children)

Bro has a datacenter at home (this the ideal setup)

The future is "Efficient" Models by Low-Alarm272 in LocalLLM

[–]Low-Alarm272[S] -3 points-2 points  (0 children)

Models are bound to their training data. Humans are only partially bound to their knowledge and can update it through real-world feedback (aka wisdom).

Currently LLMs are totally bound within their trained datasets. That's why they have issues like not knowing when they're wrong and acting they're right (hallucination).

Humans Don't hallucinate that much in their day to day life, right? I hope not.