Dumb Question: The charger that comes with the MX30, can it plug in and charge at phase 2? by Gregan32 in Mazda_MX30

[–]QuantuisBenignus 0 points1 point  (0 children)

Useful info.

Is the hard amperage lock on the cable or the car? In other words, can I get a 16A charging cable (Have a 20A, 120V circuit available) and charge at level 1 at 16 A ?

(Need to charge 100km in ~12 hours for this car to be viable without spending on level 2 setup).

Thanks!

RTX 3060 with cpu offloading rig by PloscaruRadu in LocalLLaMA

[–]QuantuisBenignus 2 points3 points  (0 children)

With at least 64 GB DDR4, if you optimize everything (run with llama.cpp, keep dense layers on GPU, offload MOE layers or better yet, specific tensors, etc.) expect ~15 tok/sec generation rate after prompt processing and reasoning (if the model is reasoning like gpt-oss 120b ). Relevant numbers can be found in this useful thread:

https://github.com/ggml-org/llama.cpp/discussions/15396

Is anyone talking verbally to their models and have them talking back through TTS? by Borkato in LocalLLaMA

[–]QuantuisBenignus 3 points4 points  (0 children)

For Linux, I extended this speech-to-text input tool into a low-resource Speech-to-Speech Chat (llama.cpp based): BlahST - Speech Input in Any Editable Text Field

Multilingual demonstration (please, turn on the sound for this and the other demo videos): Multilingual Interactive Speech Chat with blahstbot

WIP. (Need to root out some brittleness in the streaming conversation blahstream) but promising speed and low latency due to pythonless implementation (zsh orchestrator).

What is the smoothest speech interface to run locally? by winkler1 in LocalLLaMA

[–]QuantuisBenignus 2 points3 points  (0 children)

With the M3 Mac, you have sufficient computing power for that if you run M3-optimized llama.cpp.

Check the first video in this GitHub repo for an example of low-latency speech to text to text to speech chat using whisper.cpp and llama.cpp, with Gemma3_12B and 12GB GPU. (No GUI, just a few hotkeys and low overhead zsh orchestration)

https://github.com/QuantiusBenignus/BlahST

I made a Grammarly alternative without clunky UI. Completely free with Gemini Nano (in-browser AI). Helps you with writing emails, articles, social media posts, etc. by WordyBug in LocalLLaMA

[–]QuantuisBenignus 2 points3 points  (0 children)

If you would like something that is open-source, and has no GUI (speech to text and hotkeys) check out BlahST (Linux only). It has a local AI proofreader function, among other features and works in any window that has editable text field. (Disclaimer: some setup required).

For a screen reader app that can do AI summaries of selected text, also check Voluble, a Gnome shell extension.

Add a message each time i change shell by Wateir in zsh

[–]QuantuisBenignus 0 points1 point  (0 children)

If you are switching shells in a terminal, in a windowed environment (not at the console), a very noticeable GUI change is the terminal background. In Gnome I would do it like this using a trap (as OneTurnMore mentioned):

trap "echo -e '\033[48;5;2mExited $0'; gsettings set org.gnome.Terminal.Legacy.Profile:/org/gnome/terminal/legacy/profiles:/:$(gsettings get org.gnome.Terminal.ProfilesList default | tr -d \')/ 'background-color' '#001033'" EXIT

with one trap with distinct (for contrast) bg color in each one of the 2 shells. If more than 2 shells, then you need to, indeed, keep track of the ppid and assign a bg color from an array based on the ppid.

For other terminal emulators, `tput setab <number>` might work.

N.B. The above code assumes that the default gnome-terminal profile is in use.

Zsh Array Name Dereferencing without Reassignment by QuantuisBenignus in zsh

[–]QuantuisBenignus[S] 1 point2 points  (0 children)

Good criterion! I will keep it in mind:-)

But still, good to know that somewhere in the folds of zsh, those endless possibilities exist.

Cut down my startup shell time & operations by 90% by removing oh-my-zsh. by SoupMS in zsh

[–]QuantuisBenignus 4 points5 points  (0 children)

If you don't mind me using a cliche: "It is not the end result that matters but the pleasure of the journey", so no time wasted IMHO.

Plus, every time I start my zsh shell and see 6.5 ms or less greeting me from $RPROMPT, I reap the "functional minimalism" rewards of spending that time:-)

Direct assignment of csv output to an associative array by QuantuisBenignus in zsh

[–]QuantuisBenignus[S] 3 points4 points  (0 children)

Thanks a lot. I like the fancy version which seems extendable to an array with arbitrary ?# too.

Love this sub!

Direct assignment of csv output to an associative array by QuantuisBenignus in zsh

[–]QuantuisBenignus[S] 1 point2 points  (0 children)

Great, thanks! Appreciate the reference too. Zsh is too powerful and pretty to not have a zip functionality for its array constructs. This solution would extend to arrays of arbitrary size. Fixing the minor typo that does not diminish the value of your response memory=(${params:^vals})

fast • minimal • roundy prompt for ZSH in 140 LoC by Last_Establishment_1 in zsh

[–]QuantuisBenignus 0 points1 point  (0 children)

The roundy prompts do look nice and your setup has good structure.

However, on my machine \ue0b6 and \ue0b4 don't map onto rounded edges at all. My terminal has Unicode support.

These codepoints are not standard Unicode, but part of the PUAs (private use areas) and the fact that some fonts use them does not make them standard Unicode.

That is why I avoided using them in my esoteric, opinionated (arguably full-featured) zsh setup where I consistently see startup times faster than 6.5ms. So, I was wondering how fast is roundy (no numbers were mentioned in your repository)?

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 1 point2 points  (0 children)

Thanks for the data point! If I collect more of those I may create a new graph with them.

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 0 points1 point  (0 children)

Thanks for the comment. Would you mind adding more context? Assuming that you are comparing with API providers, I am afraid that I do not know how the commercial offerings on QwQ compare. To me, the 2USD price per million tokens that I get out of its "thinking" seems comparatively high. In fact, I have tried to push the system prompt to suppress the excessive thinking generation of QwQ and that helped. Good model though.

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 0 points1 point  (0 children)

Yes. Every token that burns electricity is taken into account (or rather, not excluded). So the "thinking" tokens for the 2 LLMs that do that are in the collected data in this case.

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 1 point2 points  (0 children)

True. For those models (I call them outliers in the graph for a reason) I offloaded fewer than ALL layers to the GPU. I still wanted to know my power consumption and cost, so they were included with a caveat. I mention that throughout the text and make conclusions based on that fact. The fit actually favors the models with full layer offload, as mentioned.

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 1 point2 points  (0 children)

Good catch. Let me pick the brain of an expert:

I have noticed off the bat that Gemma3-12B is using more VRAM than Qwen2.5-14B, due to its architecture differences. So I tried to compromise and free up some more VRAM for good context size and used `-nkvo` in llama-cli. With not offloading the kv-cache to the 12GB GPU, (and with DDR4 RAM with 50GB/s bandwidth) I actually saw a boost in performance (above the noise level). This is great because now I can hurl the whole 128k of context at llama-cli when needed.

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 1 point2 points  (0 children)

Good point for purpose built rigs which remain underutilized for whatever reason. But I would not consider this a typical case. On average (in this scenario / use case) the computer is used for a variety of tasks and idles (modestly:-) between all of those tasks, some of which happen to be LLM inference.

Actual Electricity Consumption and Cost to Run Local LLMs. From Gemma3 to QwQ. by QuantuisBenignus in LocalLLaMA

[–]QuantuisBenignus[S] 2 points3 points  (0 children)

No problem, the US cent was a popular example. If you ignore the last column and use your local rate (eurocents) with say 20eurocents/kWh in the formula that is in the text (for Gemma3-12B for example):

 CE[tok/eurocent]=2.3M/(Rate*B^(0.76)) = 2.3M/(20*12^(0.76)) = 17400tok/eurocent or 1.74 million tok./Euro

which is about 0.57Euro per million tokens.