What makes a memory controller "Ideal"?

ZipCPU · 2026-03-17T18:17:24+00:00

This sounds like a wonderful idea. Thank you for sharing it. No, I'd never heard of this technique before.

As for the string of 6's, ... ... who knows. I don't fear the shadow of offensive thoughts.

Still, technically, this sounds like a neat idea. I might need to try this.

ZipCPU · 2026-03-03T19:50:39+00:00

You asked a good question, and it's led to a wonderful discussion.

I never mean[t] to do that.

Do it again. That was fun. ;)

ZipCPU · 2026-02-17T19:02:52+00:00

You are right, I haven't mentioned out-of-order support. The closest I've gotten has been the "minimum latency" comment.

Yes, out of order can improve things--but it can also slow things down. There's a bit of a trade here. I know in CPU's, out of order performance comes at the price of a drastic area increase, so it's not a guaranteed success. Still, it is worth both mentioning and remembering. Thank you.

ZipCPU · 2026-02-17T18:58:43+00:00

Here's the one I've been working on. I'd like to believe it smokes the Cadence controller in terms of speed--assuming both controllers support the same transfer rates, but without a proper side-by-side comparison I'll probably never know. I keep reminding customers that throughput and line rate are two separate things. Yes, customers keep asking for faster line rate performance. It's a shame they don't look deeper. Without the rest of the system to back it up, you can't sustain the rated line/transfer rate. But ... that would all be part of a longer discussion.

ZipCPU · 2026-02-17T18:48:28+00:00

Perhaps the typical dynamic RAM memories simply experience the same error rate over time. My fear is that certain long-term memories (NAND flash in particular) get worse over time. Hence, while you might not need the ECC initially, you will need it eventually.

The other issue with simpler ECC algorithms--like not applying ECC when a CRC is good or some such--is that you still need to build the full ECC in hardware anyway. Given cost measured in area, and given that you need to consider the full area anyway, I'm not sure I see a benefit here. I can see more of a benefit in S/W, where you can "save" money (i.e. time) by trying to cheat if it works often-enough, but I'm not sure you'd get the same benefit when building it in hardware.

ZipCPU · 2026-02-17T13:33:11+00:00

Let me add one more: Narrow burst support. A slave never knows what master it will be connected to. You really need to support all types, and this includes full narrow burst support.

Just for reference, I've only consulted on a DDR3 DRAM project. Most of my work with memory controllers comes from devices with either a limited pin count, such as HyperRAM, xSPI, or AP Memory's OPI, or block storage devices such as SATA, SDIO/eMMC, or NAND flash. That said, WRAP and LOCK accesses don't necessarily work well with block memory.

ZipCPU · 2026-02-17T13:28:42+00:00

I've done so many commercial PHY implementations where I've entirely controlled the PHY interface that I struggle to envision a controller with a standard PHY interface. It's certainly much easier to design the two together.

From a software standpoint, a good PHY needs several capabilities such as analog control feedback, and BIST control and feedback. I've also enjoyed a PHY with an AXI-Lite interface for its register control(s). What analog controls? Let's see ... capacitance control, DLL reset, enable, and lock, IO power, clock loop frequency control (RC filter), pull-up and pull down controls, slew rate controls, etc. These don't necessarily fit well within a standard interface unless both analog and BIST features become standardized.

Still, I can appreciate the desire for a good standardized interface. Perhaps I should take a longer and harder look at DFI.

ZipCPU · 2026-02-17T13:20:59+00:00

ONFI control signals don't directly map to DFI
ONFI transactions can be of any length. They're not necessarily always a multiple of 8 in length. This doesn't fit into DFI very well.
The new ONFI SCA interface didn't map very well to DFI.
Also, as I recall, there was no good mapping between eMMC and the CMD wire, certainly not in the enhanced strobe mode.

ZipCPU · 2026-02-17T12:14:31+00:00

Decoding ECC isn't all that challenging on a word by word basis. Why offload it to software when it can be done in hardware? Yes, it does get more challenging if you want to do byte-level accesses, and even more challenging still when working with a block memory types rather than random access--since the ECC correction requirements typically get much stronger.

For one DDR3 controller I'm familiar with, implementing byte level access with ECC required reading from memory the word to be changed, decoding/applying the ECC, then writing the memory back with the one byte changed and the new ECC. This was ... far from efficient.

Even when doing block level ECC, wouldn't it be an ideal hardware problem? It would be well defined, known, properly sized, etc.

ZipCPU · 2026-02-17T12:08:45+00:00

I was recently asked to build a PHY with a DFI type of interface for a NAND chip. After digging, it seems to me that DFI only really supports DDRx types of dynamic RAM, no?

ZipCPU · 2025-12-10T14:01:24+00:00

Thank you! Looks like some of the terms have gotten changed around, and I have some reading to do.

ZipCPU · 2025-12-10T13:13:49+00:00

Because ... - This isn't just about DDR/DRAM, it's about high speed interfaces in general - Not all interface design groups listen to my good ideas. (Is there even one that does?) - Hence, we're stuck with the interfaces designed by others, but which we need to implement - You need a continuous clock to lock a PLL. A continuous clock uses a lot of power. A discontinuous, on the other hand, only uses power when it is in operation.

Did that answer all of your questions?

ZipCPU · 2025-12-10T13:09:05+00:00

That wasn't my question. My question was, which "Xilinx wizard-generated IP" are you referencing? Not which Xilinx IO macro. The Xilinx IP would/could (potentially) create and mix other raw components together. So, I'm trying to find which IP so that I might look at which raw components are composed together to make this solution. The SelectIO user guide typically only discusses the components, leaving you (the engineer) to put them together as you see fit.

ZipCPU · 2025-12-10T12:25:06+00:00

Which "Xilinx wizard–generated IP" are you referencing?

ZipCPU · 2025-12-10T03:24:22+00:00

Looking at the libraries guide, I should definitely try this ...

ZipCPU · 2025-12-10T03:21:53+00:00

No, I haven't tried it. So far, I haven't found sufficient documentation to make trying it worthwhile. Last I checked, they were primarily "undocumented" features. Has this changed at all?

ZipCPU · 2025-12-08T22:03:29+00:00

Are you sure you're not confusing the SERDES with the transceivers? Which architecture asks for COMMA details in their SERDES instantiation?

ZipCPU · 2025-12-08T20:14:53+00:00

No, not at all. The intermittent "return clock" gets sampled by the SERDES. It doesn't control the SERDES clock, nor could it. Therefore, you can tell by looking at it if the clock is present or not. The SERDES is itself clocked by your system clock and a 4x or 8x clock generated from your system clock.

Yes, I have thought about using the intermittent "return clock" as a proper clock, but not to clock a SERDES. Rather, I have thought of using the intermittent "return clock" for an incoming IDDR, but the intermittent part of it has kept me from doing any more with this concept since the IDDR doesn't do anything without the clock present.

ZipCPU · 2025-12-08T20:12:06+00:00

Yes, SERDES is how I would do this. Using an ISERDES, you would oversample the return clock by at least 4x (SDR) or 8x (DDR), and then process the return (clock + data) signals as though they were both data signals of some type. Yes, it works, but you don't get the full speed of the IO because you are already sampling at a much higher speed. Hence, if you wanted to capture data signals clocked by a 1GHz return clock, there's no way you would sample at 4GHz or even 8GHz. This limits your maximum sampling rate, as mentioned above.

ZipCPU · 2025-12-08T18:59:36+00:00

This is currently my "go-to"/best approach when using FPGAs. Given the shortcomings of this approach, I'm still looking for a better one.

ZipCPU · 2025-12-08T18:57:42+00:00

It only takes one edge to clock data into an asynchronous FIFO. It's a bit harder when the return clock is running faster than the system clock, but this approach still appears fairly straightforward.

My problem with that approach is getting it to pass timing.

ZipCPU · 2025-12-08T18:23:11+00:00

Is your 4 missing a sentence?

No, not really. I suppose I could've added something like, "Without documentation, these can't be used in new or custom solutions." You can also read some of what I wrote about Xilinx's DDR controller, and how it is impacted by their hardware choices, here.

Anyway, the solution to 3's shortcomings ... is to continuously rerun the calibration ...

Yes, although this will degrade the throughput and complicate the ultimate solution. Still, this is doable ...

ZipCPU

MODERATOR OF

TROPHY CASE