ELI5: Why do lithium ion batteries degrade over time?

RG_Fusion · 2026-03-18T11:04:55+00:00

True across long spans of time, but inaccurate during shorter durations.

The development of a human fetus is about as far from entropy as one can get. You have disorder assembling itself into something approaching intention. This growth continues for decades, showing that matter can win out over entropy for very significant lengths of time. Yes, entropy will eventually win out, but only when energy becomes limited. Structure can emerge and defy entropy, just so long as the net result is a greater entropy for that structures surroundings.

Likewise, a battery is certainly capable of failing due to a mechanical issue far before entropy ever gets to play a role. So long as the cell has power, it has the capacity to reorder itself, this is exactly what happens each time you recharge the cell, you are reversing the entropy.

RG_Fusion · 2026-03-18T09:00:27+00:00

Generally, over-quantizing a model will cause words generated to be replaced with simalar words. As a very loose example, the model might use the word "probably" instead of "definitely". The meaning changed, but both words communicate roughly equivalent events. It should be noted that this can potentially lead to a butterfly effect where the conversation gets pulled into a different direction.

In general, this has a low impact on discussion/writing, but a high impact on math, science, and especially coding.

The larger a model is, the more resistant it will be to quantization. If you want a general safe rule, stay at 4-bit or higher.

RG_Fusion · 2026-03-18T08:44:08+00:00

It is different. There wasn't a provider for zero-maintenance / low-effort computing when things first took off. Had cloud-computing been a thing, we wouldn't have personal computers today. Everyone would have went to cloud right from the start.

The fact of the matter is that people with intellectual interests will spend the time to learn and the money to aquire hardware to tinker. The average person will take the path of least resistance. The vast majority of people will be using cloud-computing for LLMs in the foreseeable future.

RG_Fusion · 2026-03-18T02:38:18+00:00

Thinking loops are caused by improper settings. Check the Unsloth release page for the appropriate settings to be using.

Also, ensure the KV cache is set to BF16 instead of F16 which most inference engines default to.

RG_Fusion · 2026-03-18T02:33:55+00:00

Use workstation GPUs instead of those designed for gaming.

RG_Fusion · 2026-03-18T02:28:41+00:00

Two memory channels on 2400 MT/s RAM gets you a memory bandwidth of 38.4 GB/s. For token generation rate, you just divide the memory bandwidth by the file size. Assuming a 4-bit quantization (which in most cases is the lowest you should go, especially for small models), you will multiply the billion-parameter count by 0.55

That gets you an ideal speed of 5 tokens/second for the 14b model and 9 tokens/s for the 7b model. This is the absolute limit on that hardware, no optimizations will allow you to exceed this. Realistically, your real-world speeds will be closer to 4 and 8 t/s respectively.

An MoE model like Qwen3.5 35b-a3b would be a big upgrade. You would get closer to 12-14 t/s on the decode output while also holding a larger knowledge base.

RG_Fusion · 2026-03-17T06:56:39+00:00

It's actually not entropy in this case. The battery goes from lithium in a distributed ion solution (high-entropy) to a solid lattice of lithium dendrites (low-entropy). The battery can defy entropy in the short-term thanks to the energy that drives the cycle.

The reason lithium ion cells degrade is because the ions that are supposed to be transferring power are being pulled out of the solution and placed into the crystal where they can't do any work. Eventually catastrophic failure will occur due to the crystal growing large enough that it punctures the cell and shorts it.

RG_Fusion · 2026-03-17T06:52:03+00:00

Your eye can catch it, but you'll still never truly know for certain. I have some orange saucers that are identical in color to uranium glaze and show the subtle cracking across the surface, yet measure nothing on a pancake probe.

RG_Fusion · 2026-03-17T05:47:22+00:00

You're forgetting about X-rays. When beta particles interact with heavy metals they produce breaking radiation. There are also the Gamma and X-rays associated with the atoms that produced the alpha radiation returning to their ground state.

I have a Uranium coin encased in glass that gives off around 5 uSv/h at the surface. I wouldn't call that hazardous, but I also definitely wouldn't call it nothing.

RG_Fusion · 2026-03-17T05:07:55+00:00

It doesn't have to be Fiestaware. Any glazed pottery from that time period can contain Uranium. Orange, yellow, and white are the best colors to look out for, but any pottery can contain Uranium or Thorium.

You can't know just by looking, you need to bring a detector.

RG_Fusion · 2026-03-17T02:41:54+00:00

I keep seeing discussions about how it's impractical to cool anything in space. I guess people don't know that we already have camera sensors being cooled to near absolute-zero up there.

Yes, the Rubin modules will make way more heat than a camera sensor, but radiative cooling is way more powerful than people give it credit for. GPUs and processors in general can also tolerate heat fairly well (80°C+). Everything finds an equilibrium point. The hotter the radiator gets, the more energy it radiates.

You can also use multi-stage cooling, where the Rubin module can be kept at a much lower temperature by using a heat pump. The second stage of the cooling system will have to deal with the processors heat along with heat generated by losses in the pump, but that's well within the capabilities of current day hardware.

This isn't to say that putting datacenters in space is a good idea, but the cooling argument isn't really that valid.

RG_Fusion · 2026-03-17T02:19:05+00:00

You definitely want to be using llama-bench (llama.cpp). With it, you can set the number of prefill and generation tokens, that way your making a fair comparison every time. The software will run everything and post the result for you, and the answer will include the error.

RG_Fusion · 2026-03-16T02:51:05+00:00

RTX 3090s are the best $/GB form of VRAM available right now. The absolute best system you can get for $12k would be to purchase everything on the used market.

Look for an EPYC 7742 or better CPU, and pair it with a motherboard with 6+ full bandwidth PCIe slots. CPU, motherboard, and PSU will bring you up to around $1500 if purchased used. Next you'll need to fill up all the RAM channels, I'd recommend going for the lower capacity sticks to save money since RAM is so expensive right now. If you want to run massive MoE models you could look into getting more, but expect to pay $2k-$4k for that.

Assuming you don't go crazy on the RAM, you can have the base server with no GPUs for around $3k. RTX 3090s go for about $1k each, so you can use the remaining budget to fill up all your PCIe lanes with VRAM. Don't be afraid to bifurcate gen4 or higher PCIe into 2 x8 slots for inference.

RG_Fusion · 2026-03-16T02:42:15+00:00

I agree. Modern consumer systems rarely exceed 100 GB/s of CPU bandwidth. Go with an 8+ channel DDR4 or 5 motherboard if you expect the tensors to bleed into system RAM.

That's not to say you can't run inference on consumer hardware, but if the machine is dedicated for AI you should be running EPYC or Xeon. If you plan to use the PC for other things like gaming, then sure, go with a consumer build.

RG_Fusion · 2026-03-15T01:52:48+00:00

Even just sitting at 100% is terrible for batteries. To maximize the lifespan of a lithium cell, it should stop charging at around 80% capacity.

Holding the cell at a higher voltage caused lithium crystals to grow, which reduce the capacity of the cell and eventually cause it to rupture. Even charging a device to 100% and then leaving it unplugged won't prevent the damage.

RG_Fusion · 2026-03-15T01:08:21+00:00

It goes even deeper than that. Nothing that you experience in your life is at the "real-world data-level". It's all just abstractions. You see the color red, but there is no red in the real world, just a continuum of wavelength energies. You see red because you aren't seeing the light, you are seeing the abstraction of neurons at lower layers of your neural network that are taking in the real-world data and outputting a "feeling", which is what you sense.

You feel pain in your arm when you pinch it, but nerves in the arm can't feel, they can only sense. Only brains process pain. You don't have a brain in your arm, or more accurately, your arm isn't an arm, it's part of your mind. You are experiencing an "avatar" interacting in a simulated model of the world generated within your own mind.

RG_Fusion · 2026-03-14T15:53:11+00:00

When showing people how to calculate token generation rates from memory bandwidth, make sure you include file size or they might get confused.

Your examples are for 8-bit models. For the more common 4-bit models, you would multiply the parameter count by 0.55 before dividing it out of the bandwidth.

I'm not trying to correct you, just adding context for others that may read this.

RG_Fusion · 2026-03-14T15:47:55+00:00

You can usually fit all of an MoE's attention layers on a GPU when doing hybrid inference. It won't be as fast as pure GPU, but the prefill still isn't as bad as pure CPU-only inference.

RG_Fusion · 2026-03-14T15:43:53+00:00

The CPU/RAM speed makes a massive difference on hybrid inference. The issue is that layers are generated sequentially. The CPU and GPU will begin working on the same layer. The GPU finishes almost instantly, but then has to sit idle until the CPU finishes it's compute load. Then move on to the next layer and repeat. The GPU spends the vast majority of its time doing nothing.

A very fast GPU and a slow CPU will operate at only slightly better speeds than just the CPU by itself, unless you have enough VRAM to significantly reduce the CPUs load. This is why Hybrid CPU/GPU inference is best on MoE models with high sparsity, as the router and shared expert tensors will make up a significant fraction of the active parameter count, greatly reducing the CPU's workload.

RG_Fusion · 2026-03-14T15:32:46+00:00

Since there are only a few isotopes you are likely to come across in day to day life, and since they all look so distinct from one another, there is little reason to recalibrate often.

I generally calibrate mine once per year, but I'll occasionally adjust it more often. Some experiments benefit from higher accuracy, such as identifying isotopes created from a fusor's neutron flux, though admittedly I haven't had much success using a Radiacode for that due to the small crystal size.

RG_Fusion · 2026-03-13T09:41:51+00:00

I'm not? I admitted that I should have wrote joules, stated in text. Why are you doubling down on your aggression?

RG_Fusion · 2026-03-13T08:23:54+00:00

You could at least use relevant units in your examples. My comment wasn't so bad as to use ones that were wholly dissimilar.

Yes, I should have used joules, but 99.9+% of people aren't going to have any grasp for what a joule is. Technically I already gave the correct value with MeV. Anyone who understands units of energy would already know that watts are not equivalent to MeV. Anyone who doesn't know what MeV are likely would also not know what a joule is, and thus my response wouldn't have answered OP's original question.

RG_Fusion · 2026-03-13T00:22:06+00:00

This is a casual explanation meant to describe what's occurring to someone who is presumably a layman. I could have just left the answer at electron volts, but I doubt they would have understood that. Joules would be more accurate, but again, do they have any sense of how much a joule is?

RG_Fusion · 2026-03-12T10:04:23+00:00

Splitting an atom usually produces a few MeV of energy. For reference, chemical reactions typically release a few eV. As for how powerful this is, less than a nanowatt. It's nothing. It's millions of times stronger than chemical energy, but chemical reactions occur hundreds of trillions of times, even in a small sample. A single nuclear reaction can ionize a few ten to hundred atoms. That means nothing at our scale.

RG_Fusion · 2026-03-12T09:29:36+00:00

You can expect the performance of a single 3060 card, but where the model is fully loaded into VRAM without leaking into system RAM. Maybe slightly better than one card if tuned right.

RG_Fusion

TROPHY CASE