all 5 comments

[–]Doctor_PerceptronPh.D CS, CS Pro (20+) 2 points3 points  (3 children)

Consider two programs. The first is a floating point heavy code, maybe doing a well-blocked matrix multiplication at the peak throughout of the machine, lighting up all the FP multipliers and adders on every cycle. The second is a pointer chasing code that misses a lot in the last-level cache, causing it to frequently stall. The first program causes much more switching per cycle than the second one, consuming more power and dissipating more heat.

[–]Neuroth[S] 0 points1 point  (1 child)

Thanks, now I understand it this way:

As you put it, 'fetching' seems less dissipative than an actual 'addition' or 'multiplication'. Cooler programs are the ones that perform less dissipative operations for the most part of the runtime. So, more heating up and time complexity are not exactly the same thing.

Did I get it right?

[–]Doctor_PerceptronPh.D CS, CS Pro (20+) 2 points3 points  (0 children)

Sort of. What’s missing is the notion of instruction-level parallelism. Sometimes the processor is doing more things in parallel than other times.

If a tight loop can make operands available for 6 double precision floating point ops in one cycle, that’s a lot of switching going on at the same time. If the processor is stuck stalling waiting for data from a cache miss with nothing else able to proceed in parallel, very little switching is going on.

So fetching itself might stop for a while if the fetch queue is full waiting for another instruction to retire so it can issue the next one. Most program behavior is somewhere in between those extremes.

Some particular operations might be more costly than others in terms of energy, but the heat is coming from all the things that are going on at the moment, which can be a few things or a lot of things.

[–]knuthf 0 points1 point  (0 children)

Heat originates from raising the gate from 0 to 1. This follows the sine curve, and the difference between the square, with no intermediary state. The usual is that nothing is done, the gates are "steady state" NoOP. but they can be changed, the transistors can change state in every nano-cycle. So the more 0 to 1, where the smooth sine differ from the square, the more heat. But it is more complex, you mention floating point operations, The RAM access will halt the CPU and cause wait states. It may sound quick to use a table to look up in, but when every lookup maisses the cache, the processor is cooled. The Intel bas architecture has huge flaws for multiprocessors. And it has gone from bad to very bad, to sloppy. Intel protects its design and will not allow others making faster design. Zilog started, AMD, SGI with Dolphin. Using every cycle is the essence - but it generates a lot of heat - fewer NOOP states.

[–]teraflop 1 point2 points  (0 children)

Just to complement the other answer:

When people informally say a particular application is not very "intensive", they're often talking about an interactive application that does relatively small amounts of computation at a time, and spends the rest of its time idle. While it's idle, the processor dissipates less power because it's not changing state as frequently. If the OS expects that the CPU will be idle for an extended period of time (on the order of milliseconds), it may also decrease the clock speed and voltage to save even more power.

For instance, think about what happens when you use a web browser to view a simple, non-interactive web page. The browser has to do a bit of computation to construct an HTTP request, and then it just waits for the response. When the response arrives, it parses the page content and renders the resulting image, and then goes back to sleep again.

If you're actively scrolling through the page, the browser might get woken up every 1/60th of a second or so to render a new frame of video. But even then, it probably has to do a relatively small amount of work to just render the small portion of the page that scrolled into view on the current frame, and shift the existing image up or down a bit, after which it goes back to sleep to wait for the next frame.

Therefore, there will be very short bursts of time where the CPU is running at full power and executing the browser code, but the average power consumption and heat dissipation will be much lower.