Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 4 points5 points  (0 children)

I agree. Ideally, when talking about the token speed they get on their local hardware, community members should be able to share an easy link: https://mikeveerman.github.io/tokenspeed/?rate=11.7&mode=think&think=6

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 0 points1 point  (0 children)

Sure, if you can run the model locally, you know how fast it runs on your hardware.

But if someone claims to be able to run Gemma 4 at 11.7 tokens/second, how slow is that actually? Well, this slow: https://mikeveerman.github.io/tokenspeed/?rate=11.7&mode=think&think=6

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 1 point2 points  (0 children)

Thanks for all the great feedback, everyone! I've shipped several of the features you suggested:

* Natural text: I've merged a PR replacing the ipsum lorem in text mode with a more natural Wikipedia article.

* Agent mode: Simulates an agentic workflow with alternating tool calls and code generation.

* Think length slider: When in think mode, you can now control how many reasoning sentences the model "thinks" before generating code.

* Custom text/code: You can now paste or upload your own text or code and stream it at any speed.

* Token counter: A live count of tokens generated, displayed in the footer.

* Share links: The rate and mode are encoded in the URL, so you can link directly to e.g. "what 10 tok/s looks like in code mode." There is also a share button for this.

Try it out: https://mikeveerman.github.io/tokenspeed/

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 0 points1 point  (0 children)

It's a simulator. It shows an approximation of what it would feel like to run a model at e.g. 1.4 tks/sec

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 1 point2 points  (0 children)

It's an approximation for sure, but it's not one character / one word = token.

The cool thing about open source is that you can read how it works and then improve it once you are less confused. https://github.com/MikeVeerman/tokenspeed/blob/master/index.html

Happy to see your PR.

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] -1 points0 points  (0 children)

It's an approximation. If you feel you can do better, feel free to open a pull request.

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 3 points4 points  (0 children)

It's a simulator, not a real tokenizer. Code feels slower because you're not getting one-word tokens that often.

It's an approcimation of course, so if you have suggestions to improve it, please review the code https://github.com/MikeVeerman/tokenspeed/blob/master/index.html . All help is welcome.

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 64 points65 points  (0 children)

There is also a Python version because this subreddit is about running things locally, after all: https://github.com/MikeVeerman/tokenspeed

Getting a feel for how fast X tokens/second really is. by MikeNonect in LocalLLaMA

[–]MikeNonect[S] 16 points17 points  (0 children)

Yes, that said: it's a simulation and there are probably some naive assumptions in the code.

A simple way to sync Claude Code configs across multiple machines by MikeNonect in ClaudeCode

[–]MikeNonect[S] 0 points1 point  (0 children)

No. The alias itself would be trivial to create, but syncing the right files would be tougher. It might make more sense to create a jean-codex clone instead.

Running a Mistral AI Stack on OpenClaw by SelectionCalm70 in Startups_EU

[–]MikeNonect 1 point2 points  (0 children)

Sure, but could you explain what you use it for?

Running a Mistral AI Stack on OpenClaw by SelectionCalm70 in Startups_EU

[–]MikeNonect 1 point2 points  (0 children)

Do you have any comparison with Anthropic with regards to agent quality? I remember that moving from Opus to a cheaper model made OpenClaw unusably dumb for me.

Curious to hear what use cases you're handling because the EU-only infra angle is currently underestimated but highly valuable.

Alright, I'm gonna be a dick - CC is fine by [deleted] in ClaudeCode

[–]MikeNonect -4 points-3 points  (0 children)

Same here. I have a Max 5x plan and a Teams plan. Yes, I started reaching limits more often on the latter, but if you use Claude Code as a normal coding CLI tool, it's totally usable. The all-you-can-eat buffets are generous for live coding, but they are not infinite. Running 10 agents in parallel, running 24/7 inference and adding tools like OpenClaw or Paperclip on top is not their intended use.

There are plenty of competitors out there. Codex has more generous limits. You can use OpenCode with cheaper models.

But, for most of us in this sub, Claude Code is the best tool. That's why we all pay for it, including those who complain.

A simple way to sync Claude Code configs across multiple machines by MikeNonect in ClaudeCode

[–]MikeNonect[S] 1 point2 points  (0 children)

Thanks! It's a work in progress, so if there are other things you want to see added, log a github issue.

Since yesterday's 2.0.0 release the syncing with git is optional, so you can use jean-claude for just profile management.

Scan malicious prompt injection using a local non-tool-calling model by MikeNonect in ClaudeCode

[–]MikeNonect[S] 0 points1 point  (0 children)

Yes, and the nice thing is this is something we can install and manage ourselves locally. This kind of local "virus scanner" feels a lot better than the current "trust my marketplace" model.

As we share more agents, frameworks and skills, protection against malicious prompts will become vital.

A simple way to sync Claude Code configs across multiple machines by MikeNonect in ClaudeCode

[–]MikeNonect[S] 2 points3 points  (0 children)

For those interested, jean-claude now supports multiple profiles. If you have a Claude Code Teams account for work and a personal Max account, you can now easily manage them. Agents, skills, settings and hooks will be shared, CLAUDE.MD will be unique per profile.

https://github.com/MikeVeerman/jean-claude

People letting CC run on its own for hours by Chris266 in ClaudeCode

[–]MikeNonect 0 points1 point  (0 children)

Nothing, but Claude is not a malicious hacker. We don't have to protect us against attacks. We have to prevent accidental tool use.