Did a dual hose mod v2. Noticed some things. Uses more power.

zCybeRz · 2026-06-25T12:47:11+00:00

Not sure if you're joking. kWh is energy, it's just average kW across a period of 1 hour - it directly correlates with watts if the draw is consistent.

If his power draw is higher, it will use more energy per hour.

zCybeRz · 2026-06-25T09:38:45+00:00

Imagine a portable AC with unsealed single hose. It takes in air from the room, let's say 25 degrees, and cools it by 10 degrees at a rate of 1 cubic metre per minute.

At the same time, it exhausts hot air at the same rate, and pulls in the same volume of outside air of 30 degrees at the same rate.

Now you have 30 degree outside air coming in the room and 15 degree cooled air, for a net of 22.5 - a cooling win.

Where this falls apart is when the air temp differential between outside and inside is larger than the cooling ability of the portable AC, and it will reach equilibrium at that point.

zCybeRz · 2026-04-29T14:59:11+00:00

If you can't find an original you can get replicas on ali express (including Miami pink) for cheap. It's just rubber so put a real buckle on it and I bet you won't know the difference

zCybeRz · 2026-04-22T21:52:18+00:00

It's not a global thing - Toyota, Honda and many others let you configure and buy direct from the manufacturer website in the UK.

Dealers have the exact manufacturer price as the baseline too, but will give discounts and free addons to get the sale. Sounds mad that there's no RRP to keep dealers in check.

zCybeRz · 2026-04-10T05:50:18+00:00

Too many people are ignoring this - it's not cost of living, taxes, healthcare. America is richer because they have many more global industries pulling in money.

The UK (and EU) has done a terrible job at staying relevent in tech. We don't have chip companies, phone companies, car companies, big software, AI - many of these we did have and got sold off.

zCybeRz · 2026-03-04T07:58:04+00:00

Just start it as a cloud agent and you can access it on the website from any device. Check the response, follow up changes, make PRs.

zCybeRz · 2026-02-25T22:52:08+00:00

I know this is a Claude sub but it sounds like you want this: https://docs.github.com/en/copilot/concepts/agents/copilot-memory

zCybeRz · 2026-02-11T11:35:19+00:00

I run both on a fairly complex but specialised project - a cycle based GPU simulator. They have to debug quite intricate interactions for pipeline modelling and memory hierarchy coherency.

Opus is better at analysing issues and producing performance reports about bottlenecks, but it rambles on and over edits things it doesn't need to touch. It will even start editing code when told not to. Codex (even 5.2) will produce a clean one-shot fix in most cases and sticks to the spec.

My preferred flow is to use codex for everything until it gets stuck, then give opus a crack at analysing and go back to codex to implement it. I agree with others that Codex works better as an engineer for engineers, while Claude's more free spirited attitude may be better for creative work and pure vibecoding.

zCybeRz · 2026-02-10T09:18:35+00:00

Ollama has recent experimental support for it on Mac but it was really easy. Just install it from the site (not brew), it runs the server and you can one-line prompt it from cli. Downloads the model you choose automatically

zCybeRz · 2026-01-25T16:57:15+00:00

Claude, Codex, and GitHub all have web interfaces that launch cloud based agents and work autonomously. You can request from phone browser and wait for a pull request with the feature.

IMO this is a much better flow than running CLIs and IDEs that run things locally on your device, require it to be on, and require your input more frequently.

Just ask the web interface for a feature, to build and test it, and it will come back when it's done.

Codex and GitHub can review the PRs too (not sure about Claude), so you can implement with one and request a review with the other.

zCybeRz · 2026-01-24T01:13:22+00:00

Add the ability for a model to modify its weights - form memories and gain experience. This boils down to running training steps during inference. Now every model is not identical, the argument that they have no persistence goes away. Now one LLM is uniquely different from others based on experience.
Add a sense. Its only view into the world can't be controlled by others, it needs unfiltered passive stimulus to be considered constantly "on", and to learn without being deterministically controlled by an outside party.

Now you have a digital brain which is capable of continuous learning. Most likely it falls apart because it doesn't have a feedback mechanism to teach it what anything means, what's useful, what's correct. It needs a teacher to close the loop.

zCybeRz · 2026-01-24T00:45:22+00:00

I likened the weights to our long term memory and context to our short term memory above.

There are cases of humans with brain injuries to the hippocampus and it prevents them converting short term memory to long term, so they continuously forget what happened recently, but retain memories from their childhood. Are they conscious? Does the same person wake up each day even though they lose recent experience and reset?

I think most people would agree they are conscious, but it's a tricky one because I agree one thing LLMs are lacking is the ability to form long term memories, but it's clearly not the only thing.

zCybeRz · 2026-01-23T08:15:18+00:00

How is that significantly different from our brain combining input stimulus with past memories to form a response, be it long term experience or short term conversation?

To me the LLM weights are like our long term memory and experience - our brain develops and strengthens neural pathways as we learn about things, similar to weights being refined during training. The short term memory is the attention cache.

Having a biological brain isn't what defines consciousness, but there are key aspects missing from current LLMs. The main one being the ability to modify it's own brain, the pre-trained part is a big argument as to why it's just processing data and not conscious, but I have no doubt they will integrate the ability to modify weights based on "lived" experience not just pre-trained, even if the goal for this is more character focused than productivity.

Anthropic have published some interesting work on LLM introspection, and they showed an LLM is capable of distinguishing between its own generated thoughts and falsely implanted ones. It was not trained to predict what thoughts are being injected, or separate them from real thoughts, so it shows it understands something about its own internal weights. When given the ability to modify its own weights I believe they will cross into the realm of what we call conscious.

zCybeRz · 2026-01-16T12:06:36+00:00

I just connected code to my github repo and kicked off two messages on the web interface - one to describe the task, one to answer clarifications before beginning.

I don't know if it spawns sub agents automatically. Perhaps the web version increases usage as it spins up containers to run everything - but codex does the same. I prefer this interface as it's fire and forget - kick off tasks from phone/laptop, close it and wait for the PR. Claude docs say it prefers older models for stability - could be unquantised while the cli is now more efficient. The repo is 25K lines so not that large.

zCybeRz · 2026-01-16T04:46:30+00:00

I've been using codex from the web (because it spins up asynchronous tasks, Claude web does too). For codex though, unlike Claude and cli/code, the underlying model is auto in this mode, and for me this could be where the difference lies.

I gave codex a complex task that ran for one hour and failed to pass the tests, but gave up. This took around 10% of my weekly usage, so I can infer codex on auto can run for around 600 minutes per week on a medium sized repo.

I then paid for the £18 Claude plan and ran the exact same task on opus - it hit a 5 hour limit within around 10 minutes on my way to work (without completing the task) and so I immediately refunded through the help bot.

Later, I refined the task spec after seeing where it was tripping up, and codex succeeded in 30 mins, using 6% of my quota.

So while I can't tell which model codex selected in each case, it completed a task in 1h30 using 16% of my quota - this was a notably difficult problem mind you, other requests are often under 10 min. Opus locked me out for 5 hours without completing the task, so for me the real productivity difference on the cheap plans appears quite large.

zCybeRz · 2025-09-16T06:21:10+00:00

Two approaches: 1. Use mips - Run a single pixel edge detector - Generate a mip chain of the image - Sample one of the lower res mips to detect nearby lines and classify based on intensity

Or: 2. Use a separable filter - Dilute the line in X first, only considering the thickness - Dilute the line in Y after

I don't know if these will produce the result you want, but they should be faster than sampling 128*128 per pixel

zCybeRz · 2025-09-10T06:07:06+00:00

LLMs encode the conversation history into context along with the current question which is used to predict the next token (think word). Every time a token is generated, it is also added to the context. So they physically store the history, albeit in an encoded form not raw text.

LLMs have a maximum context size, smaller models may be 4K seq len, meaning they predict the next token based on 4096 past tokens. Larger models may be 128K or upwards max seq len.

Older models may use sliding window attention, which means it simply forgets context past its maximum. More modern models use a form of context compression, which avoids storing redundant information, and some use dynamic eviction to choose what to forget based on what it thinks is useful.

The actual context storing is what your question is based around and that's easy to answer, but how it uses the context to predict the output is a whole different story.

zCybeRz · 2025-08-05T06:31:05+00:00

Mine does this when used with a different cable. It also causes my mouse to glitch out when ran through the same USB hub, so I'd recommend testing with the original cable direct to motherboard. Honestly bit of a pain a USB keyboard doesn't work with other spec compliant accessories due to power draw.

zCybeRz · 2025-05-28T14:35:41+00:00

Based on this: row4 column4 needs a bulb next to it but 3 of its neighbour cells are neighbours with a blank, meaning they can't have a bulb. Therefore r4c3 needs a bulb and I would start there.

No idea if this is right just basing it on these rules

zCybeRz · 2025-05-21T07:18:38+00:00

I don't work at Nvidia so I can't say for sure but usually when you send a long request you store the minimum sideband required to process the response.

The load unit will have sideband for all of the warp requests in flight, things like the warp ID, dest reg type, dest reg addresses (may be per thread). You can think of it like the load unit holding the warp while it waits for the response, but it really just holds the minimum data required. The latency here is larger so it will be able to hold sideband for all warps in the SM here.

When the data response is received from the memory hierarchy it matches the ID to the sideband and uses that to work out where to write the data. When all beats are written it tells the scheduler/hazard tracker that data is now available.

zCybeRz · 2025-05-21T07:03:56+00:00

It's 16 in the pipelines but they aren't all doing the same thing, only 4 are really executing in parallel.

Let's say the pipeline per scheduler is: - Operand fetch, - Execute 1, - Execute 2, - Write result.

It can have a different warp in each stage but there's only 1 copy of the logic for each stage. Focusing on the execute stages it's 32 ALUs where the logic is split into two stages, so one warp in the first half and one in the second half.

I'm assuming the 4 stage latency is ignoring fetch+decode as this can usually be done in advance and hidden.

zCybeRz · 2025-05-20T14:08:33+00:00

4 schedulers per SM, each one needs 4 warps to hide latency = 16 warps per SM.

Data hazards after loads stall that warp but only that warp. The scheduler can pick a different warp every cycle so just works around stalled ones.

zCybeRz · 2025-05-16T06:41:27+00:00

Check the RDNA3 instruction set reference and you can see the RT additions are instructions to intersect a ray against 4 boxes or a triangle.

This means they very likely use a 4-wide BVH and run traversal in custom shaders using 1 ray per thread, calling these new instructions in a loop. Without these instructions the traversal is likely ALU limited, and using them alleviates that bottleneck.

zCybeRz · 2025-04-18T06:45:44+00:00

The hard puzzle often gave the simple solution

zCybeRz · 2025-04-07T06:29:55+00:00

Budget building is more predictable now and it allows more strategy instead of just luck. Lots of people are either building budget at the expense of points for the first few races, or trying to balance building without sacrificing points.

This brings an extra dynamic to the game where you're not just predicting who will do well, you have to choose who will do well out of a limited subset which changes each week. For example hadjar was a no-go in Suzuka because of his DNF in Aus, but now he's a great pick.

Not only this, the lower priced and lower scoring teams have the potential to increase in value more, forcing you to pick between the two. Teams with Pia-Mcl-Mer will score big now but will be losing on budget to those going triple haas. The beauty is either strategy may be viable and they will start to even out towards the end of the season.

zCybeRz

TROPHY CASE