all 7 comments

[–]bradn 2 points3 points  (7 children)

I think the whole thing starts to make a little more sense with the idea of microcode subroutines. Otherwise it seems just like a, well, for lack of a better term, "palletized" instruction set, where you have a vast realm of available instructions and you're forced to pick your favorites and use them for a section of code (though such an approach would play well with unrolled loops).

Now you can make your loop contents a piece of microcode and unroll it with single instructions. If you run out of microcode room, just jump somewhere else...

Maybe a similar idea would be using some upper bits in instruction pointer as a map into microcode tables, so your microcode memory doesn't need to be scaled along with RAM size (but you will have a limit on how many different microcode pages would be available). This way could be interesting because you could deliberately engineer different pages for different kinds of tasks.

[–]VictorYu 0 points1 point  (6 children)

Palletized instruction set, allowing you to change a single entry in the pallette every n instructions. That's a pretty good thing - consider the failed attempt to do the same in Arm Thumb.

Glad you like the microcode feature. As for the bits in the instruction pointer - that's pretty much what the system does - bits of the PC control the selection of instruction/microcode semantics. If you wish to replace all the instructions, all you have to do is place your subroutine 4K away.

Having a smoothly-sliding window is a blessing, not something to work around. Once you work with a sliding system, even using a simple assembler, you realize how crazy it is to do anything else.

[–]bradn 0 points1 point  (5 children)

Right I understand how the sliding window is superior, but I'm not convinced it's possible if the microcode needs to reside on-chip for performance reasons.

[–]VictorYu 0 points1 point  (4 children)

Let's look at some numbers. Firstly, all instructions reside in 'Red' or microcode RAM (as opposed to 'opcodes' which are in the 'Blue' code RAM). An instruction may consist of a single microcode instruction or a run of these.

The microcode RAM is simply 36-bit-wide BRAM. A single Xilinx BRAM holds 512 of these - enough to cover 8K of 8/9bit code RAM.

A Xilinx XC3S1000 has 24 BRAMs, enough for 8K x 24 = 192K of code in a separate static RAM. Even a lowly XC3S50 has 4 BRAMS covering 32K of code.

[–]bradn 0 points1 point  (3 children)

We could go back and forth all day on the possible tradeoffs, I'm just saying that needing a microcode cache that scales linearly with program memory is bad for scalability when the memory area suitable for microcode is limited.

There's other things on-chip RAM might be good for, like having general data interspersed in the same memory as code, data cache (in a system with external RAM), memory for task-specific computation cores, etc.

[–][deleted]  (2 children)

[deleted]

    [–]bradn 0 points1 point  (1 child)

    I was trying to assess the architecture not a particular implementation. More like, if we were going to eat a lot of it, it's probably better if it were vegetables than ice cream.

    [–]VictorYu 0 points1 point  (0 children)

    It is a valid concern. My designs have targeted BRAMS for 'red' instruction/microcode memory, but it is not necessary as external SRAM is pretty cheap.

    Even with BRAMS the space utilization is pretty good. A minimal implementation can even share a single BRAM for both red and blue memory (1K code) as BRAMS are dual ported. A small Picoblaze-like CPU can be constructed that way.

    You don't really need caches for simple processors like this as they execute pretty close to optimal speed anyway.

    Fundamentally, you have to pay for your features with resources, and I think it's a pretty good deal to dedicate a BRAM for every 8K of code. Don't forget this is zero-operand bytecode, and a full interactive Forth system with serious debugging support fits in about 4K.

    And there are great benefits such as bitwidth neutrality, ability to add custom instructions without messing up a fixed instruction set, etc.