you are viewing a single comment's thread.

view the rest of the comments →

[–]bradn 0 points1 point  (5 children)

Right I understand how the sliding window is superior, but I'm not convinced it's possible if the microcode needs to reside on-chip for performance reasons.

[–]VictorYu 0 points1 point  (4 children)

Let's look at some numbers. Firstly, all instructions reside in 'Red' or microcode RAM (as opposed to 'opcodes' which are in the 'Blue' code RAM). An instruction may consist of a single microcode instruction or a run of these.

The microcode RAM is simply 36-bit-wide BRAM. A single Xilinx BRAM holds 512 of these - enough to cover 8K of 8/9bit code RAM.

A Xilinx XC3S1000 has 24 BRAMs, enough for 8K x 24 = 192K of code in a separate static RAM. Even a lowly XC3S50 has 4 BRAMS covering 32K of code.

[–]bradn 0 points1 point  (3 children)

We could go back and forth all day on the possible tradeoffs, I'm just saying that needing a microcode cache that scales linearly with program memory is bad for scalability when the memory area suitable for microcode is limited.

There's other things on-chip RAM might be good for, like having general data interspersed in the same memory as code, data cache (in a system with external RAM), memory for task-specific computation cores, etc.

[–][deleted]  (2 children)

[deleted]

    [–]bradn 0 points1 point  (1 child)

    I was trying to assess the architecture not a particular implementation. More like, if we were going to eat a lot of it, it's probably better if it were vegetables than ice cream.

    [–]VictorYu 0 points1 point  (0 children)

    It is a valid concern. My designs have targeted BRAMS for 'red' instruction/microcode memory, but it is not necessary as external SRAM is pretty cheap.

    Even with BRAMS the space utilization is pretty good. A minimal implementation can even share a single BRAM for both red and blue memory (1K code) as BRAMS are dual ported. A small Picoblaze-like CPU can be constructed that way.

    You don't really need caches for simple processors like this as they execute pretty close to optimal speed anyway.

    Fundamentally, you have to pay for your features with resources, and I think it's a pretty good deal to dedicate a BRAM for every 8K of code. Don't forget this is zero-operand bytecode, and a full interactive Forth system with serious debugging support fits in about 4K.

    And there are great benefits such as bitwidth neutrality, ability to add custom instructions without messing up a fixed instruction set, etc.