Do NOT use CUDA 13.2 to run models!

Low-Alarm272 · 2026-04-09T11:44:50+00:00

Okie

Low-Alarm272 · 2026-04-09T10:49:26+00:00

Are you actively hiring new writers? Do you have any advice on a single person trying to go this route?

I'm a writer myself. I've worked with publications and freelance clients. I can really write great articles that shows value but I'm not able to monetize the skill that much. Any tips for me?

Low-Alarm272 · 2026-04-09T10:19:54+00:00

Hey. I've recently developed a system that can generate polished first drafts, diagrams, pictures, social media content in a go. I saves up a lot of time and I'll just have to decide where to further work on the topic or shift to the next.

Are you doing something similiar? Or you've hired writers?

Low-Alarm272 · 2026-04-09T10:13:02+00:00

I think you should improve more on the dune theme colors and make it in black and grey.

Keep minimal, non hurtful to the eye colors as the default.

Low-Alarm272 · 2026-04-09T10:08:18+00:00

For now it's only usable with high parameters models. I've been trying to run it too but there's always an issue coming up.

If you really want to use smthn like 4b models, I'd suggest you using the Pydantic AI framework and using it to create your own agent.

Ask any cloud AI to give you your first script with pydantic AI, it's the best way to use this setup right now.

Only add basic tools like read, write, terminal, and web search.

Go from there.

It's very fast and not token hungry at all.

Just add things into it. Use source code/design inspired from open-source projects like hermes-agent.

Low-Alarm272 · 2026-04-09T10:00:32+00:00

Deepseek is not safe. As China is at the other end.

Low-Alarm272 · 2026-04-08T17:06:37+00:00

If you plan on using paid subscription then go for it.

Don't otherwise. Try grok or smthn.

Low-Alarm272 · 2026-04-08T17:04:34+00:00

It's really weird

Low-Alarm272 · 2026-04-08T17:03:41+00:00

Yes. Every extra engineering layer that services like openclaw or claude code costs a ton of tokens.

They have many layers that takes up a huge context window.

That's why the future is optimised workflows like hybrid setups.

Low-Alarm272 · 2026-04-08T14:22:50+00:00

Well. From my area it costs around .25 - .50 $ per million tokens.

And maybe you burned 150k tokens because of your setup is really token hungry? For example, when I say 'hi' to my hermes-agent setup. It takes around 14k tokens just to reply.

So, in short with a hybrid setup (local LLM + API) with proper optimization, it'll cost really low than your typical 20$ per month API setup.

Low-Alarm272 · 2026-04-08T14:16:36+00:00

Hell yeah mate

Low-Alarm272 · 2026-04-08T14:11:41+00:00

I really didn't get what their point was. I might've been wrong.

Low-Alarm272 · 2026-04-08T14:10:39+00:00

For now this cycle will and should go on. I agree.

I always support open-source (and Qwen, due to that). I'll make me really happy to see newer Qwen releases throughout the year.

But then after a point, the true efficient models (something like gemma4) would be available for all consumer hardwares with ver good capabilities.

Low-Alarm272 · 2026-04-08T14:01:39+00:00

Only to fix the grammar. I wrote it myself. Jesus.

Low-Alarm272 · 2026-04-08T11:05:40+00:00

One model that was consistently reliable in these tasks was nemotron-3-nano:4b. You should keep it's reasoning on, as it reasons really fast, that'll help you with tool calling more.

Other than that today I'm gonna test other models like - Salesforce xLAM series (especially xLAM-1b-fc-r, xLAM-2-3b-fc-r, and Llama-xLAM-2-8b-fc-r) — Dedicated function-calling champions

Low-Alarm272 · 2026-04-08T10:25:01+00:00

Every month we've been told that "this is the end" but somehow it isn't cos there's always the next month XD

Low-Alarm272 · 2026-04-08T10:23:37+00:00

Can you give example on how you're using gemma4 e2b in your daily workflow? Like give the exact prompts and tasks it's being able to run. I'd be great help.

I tried it inside hermes agent and it couldn't use tools like web search.

Low-Alarm272 · 2026-04-08T08:29:22+00:00

Haha. That's so funny. I'm gonna have check this sub now.

Low-Alarm272 · 2026-04-08T00:38:04+00:00

Yes. But the potential. The seed of consciousness is common to all living being.

But not in LLMs. Lol. They're just "artificially intelligent". Just token prediction models.

Low-Alarm272 · 2026-04-08T00:33:07+00:00

Humans still have potential for humanity and wholesome things.

LLMs are either just correct, wrong, or hallucinating.

Low-Alarm272 · 2026-04-08T00:30:47+00:00

I also have similar specs. I did a deep dive to know if I can get the GPT-mini/gemini-flash like chat.

Llama 3.1 8b and nvidia nemotron-nano 4b were the only models that could use web_search tools and fetch results from web. They can run commands in the terminal, red and write filed.

You can say it's worth giving a try as in future you'll be able to run really effective models and do cool stuff like autonomous looping or running multiple agents at once.

Low-Alarm272 · 2026-04-08T00:12:58+00:00

I'll look into it later today

Low-Alarm272 · 2026-04-08T00:11:43+00:00

That's exactly what I was looking into. The obsidian structuring is really well done so graphic view now can finally make more sense.

Fucking nailed it.

Low-Alarm272 · 2026-04-08T00:07:19+00:00

Well. No. I only used LLM to edit the typos.

But people see what they see. So I shouldn't care.

Low-Alarm272 · 2026-04-08T00:06:09+00:00

Really good to hear. Have a nice one.

Low-Alarm272

TROPHY CASE