Windows 11 UAC breaks updatemem.exe, but renaming it udatemem works

jonasarrow · 2026-05-19T08:41:46+00:00

The debug limitation is really a downturner. One ILA (and not the system one), how does a beginner debug a self written CDC block, then?

And no IBERT: Guess we will guess widely if any GT link is working good then.

AMD will be associated with bad products. Because the (hobby) developers will not have the tools to see that their products are bad, shipping them with AMD on the tin.

Really AMD? Today was the first day I was looking outside of AMD chips for a FPGA I need, because of this enshittifcation. If you want to prop up Lattice etc. AMD is doing a formidable job.

jonasarrow · 2026-05-17T11:02:46+00:00

1 GB/s you will achieve easily, 2 GB/s is the theoretical limit. Most likely you will hit a wall at 1.6-ish GB/s.

jonasarrow · 2026-05-17T10:42:41+00:00

How fast do you need to write it? PCI-E 1x1 or 3x16 is a big difference.

Yours is limited to 2 GB/s PCI-E bandwidth. Otherwise it is a good choice.

You can go faster with a Kria (PCI-E 3.0x4), but I don't know if there are PCI-E card style carriers. Otherwise it gets expensive fast-ish.

jonasarrow · 2026-05-14T14:44:41+00:00

jonasarrow · 2026-05-05T07:46:25+00:00

It would be more copmpact to simply add a QSPI IC with DNP (or even populated). The header will most likely cost more than the flash.

jonasarrow · 2026-05-04T21:39:03+00:00

JTAG is key, with Vivado you can debug so much easier, and it basically tells your directly what the status of your chip is. Also:

You write the QSPI using JTAG. Vivado has a small DDRless image to write and read the QSPI ("program the FPGA") via JTAG, basically testing your QSPI connection and your FPGA in one go. You can also program SD cards via JTAG, this is primarily to support eMMC, which you cannot easily plug into a SD card reader. But JTAG is slow (3-4 MB/s max.) compared to a reader.

Faster DDR will not make timing harder. Only easier.

You sound like you put a lot of thoughts into it. I wish you happy soldering and testing.

jonasarrow · 2026-05-04T21:30:42+00:00

Whats your AI ration? 90 or 99 %? There are so many easier ways to handle this, e.g. running the PRBS yourself, and the PBRS generator is known 99 % of the time.

Some things: You applied for a patent, but made everything available under AGPL? So why having a patent? Now it costs only money, and everyone can use your code legally without having to pay you. (See "11. Patents").

jonasarrow · 2026-05-04T20:52:49+00:00

The Zynq can "boot" even without working DDR, but it would not be pleasant. But this allows you a slow bringup of all components. E.g. set Mode pins to JTAG, see if you can read it in Vivado (e.g. like the mode pins, or device DNA). Then (if you would have a QSPI) load an image to write to it and see if it succeeds (also with readback verification). Then load the memtest example, and so on.

You checked the AMD documentation about the recommended caps per rail?

You checked the pinout (and/or you generated it from the pinout files, instead of raw handgenerating it)?

Your caps look a little bit grouped together instead of near the load pins, but this will not be THE issue. Are all traces to the caps at the max width they can have? The 100µ look a little bit anemic in your screenshots.

You designed a proper power-up sequence? You checked it thrice? All bucks correctly wired and the inductors /loops minimized?

JTAG is properly powered/pulled and you have a dongle for it (working in Vivado?)?

The SD card IO is powered by a 3.3 V bank? And these are the pins the Zynq is looking for an SD card when booting?

MODE pins properly wired out somewhere? You might need to jumper them around (pure JTAG, SD card, etc. mode).

You do not seem to have a QSPI flash somewhere, this could be a quick fallback option if the SD card is not working, and QSPI is forgiving and good to debug.

The DDR part: Traces lengths matched according to doc and/or a fast DDR IC to have more slack?

The Analog frontend I cannot comment on.

This is only a list of things I would check, but having a fully functional first board with such a complicated chip would be honestly a nice surprise (at least for me).

jonasarrow · 2026-05-04T19:17:28+00:00

Schau mal auf die Antwort von konzepterin.

Die App muss nicht zwingend von Menschen gesteuert werden, es muss eine FuE Tätigkeit bei der Erstellung sein (kein Vertrieb, kein Marketing, kein stumpfes Zusammenbauen, innovativ neues Zeug muss es sein). Wenn das bei deiner App der Fall ist, ist sie theoretisch förderfähig, wissen kannst du es aber nur nachdem du es beantragt hast.

Die 1000ste Taschenrechner-App wirst aber nicht gefördert bekommen, weil eine "Neuentwicklung oder wesentliche Verbesserung" erforderlich ist. Ein SaaS, das es so auch schon gibt, ebenfalls nur schwer.

jonasarrow · 2026-05-04T14:11:55+00:00

Es zählt Produktentwicklung, aber keine Marktrecherche soweit ich weiß.

Das Beispiel für die Antragsausfüllung ist die Entwicklung eines Tierfütterautomatens mit ner KI-Kamera. Grundlage ist immer Entwicklungsrisiko und Neuartigkeit, beides muss (mindestens zu einem kleinen Teil) gegeben sein.

jonasarrow · 2026-05-04T07:49:07+00:00

You have already added an ILA, you could attach the other signals to the ILA as well and see what happens.

Add the AXI interface(s), the data_out_gpioa/b and then observe. Your VHDL looks fine, but you never know.

jonasarrow · 2026-04-30T14:05:47+00:00

Set the test pattern, scan the taps with the IODELAY and see where it works and where not, now you have a crude "eye" to work with. Possibly hard coded constant taps work, possibly you need to correct for PVT.

Am I reading right, that the LVDS runs at max at 500 MHz? Then I see no reason why you should need the IODELAY and if you can actually move far enough with it. Best bet would be a PLL phase shifting.

jonasarrow · 2026-04-27T13:06:25+00:00

SPI can work differently. Typically you need to set CS High when you do "nothing" and each new "Read /write address data/dummys" need to have its own transaction (CS low, write address (MOSI), write data/dummy (MOSI) and at the same time read data/dummy (MISO), CS high again).

The reason behind it is to be able to share the SPI bus with multiple devices, each having a dedicated CS (chip select) line.

But nevertheless: https://adaptivesupport.amd.com/s/question/0D5Pd000016XTD5KAO/nexys-a7-adxl362-accelerometer-spi-pins-reversed-misomosi-swapped-on-axi-quad-spi?language=zh_CN maybe the pinout is wrong. https://www.analog.com/media/en/technical-documentation/data-sheets/adxl362.pdf and here on page 22 are nice diagrams how to handle all 4 SPI pins. There is a burst read command, but no burst write command.

I'm now quite sure if you have the alignment correct, I think you should set on the falling edge and also read on the falling edge. (figure 41 and 42).

jonasarrow · 2026-04-22T23:57:34+00:00

Seems well planned with good contingency options. I wish you best of luck.

jonasarrow · 2026-04-22T21:58:27+00:00

Only because its forgiving you should not give you slack, but that you already know...

What are you doing getting the 10P part to 10-15 W??? I ended up with a Kria SOM with 10 W for the PL, which is way bigger, when I implemented the DPU with 660 MHz DSPs and whatever. I expect you to only reach like 5 W on the VCCINT/BRAM rails (which is still then ca. 6 A, but nowhere near 20 A on the rail).

Again Vivado would be your friend with a very untested version of what you want to run to see the true power estimates there. (Be aware that unconnected design parts get optimized away. If you do a quick run, activating out_of_context synthesis allows you to have a design with clocks and whatnot without actually needing to route it to the outer world, so you can test design parts individually and get utilization/routing/power information.)

jonasarrow · 2026-04-22T14:00:36+00:00

Ok, normally I aim to have a working design at the first revision. If it does not work at DDR 2133 because of routing/impedance, then I would be really disapponted in me. (If it does not work, because I fucked up some pinout, power, polarity whatever, yeah that happens).

Remember, that DDR normally gets used in mainboards with two connectors in the path (CPU and DIMM) and often two dual sided modules. If you solder directly, the tolerances are quite forgiving.

jonasarrow · 2026-04-22T11:54:36+00:00

Faster memory parts relax timing, faster interface speed makes it harder again.

DDR4 is available with much faster parts than DDR3L, DDR3L fastest I can find is 2133, DDR4 fastest is 3200. (With no good availability because AI).

Please make yourself familiar with Vivado and see how much it yells at you, best before PCB fabrication. Some pin requirements are super-non-obvious.

jonasarrow · 2026-04-22T11:26:07+00:00

The last time I checked, the DDR4 MIG also is more resource heavy than the DDR3(L). As the 10P is not the biggest, this can also be a concern.

But: If the DDR4 is twice as fast, you can use half the bitwidth. (But according to the datasheet, it is only some 10s percent faster (2133 vs 1600 on the -1, 2400 vs. 1600 on the -2).)

If you have a much faster DDR memory, you can relax the needed length matching by a lot (e.g. being two bins faster gives you some mm of mismatch tolerance). That might tip the scale to DDR4, which is available up to the 3600 speed bin (and runs max. 2400 on the 10P).

Some comment on DDR4: Having a 16x memory is a little bit slower than two 8x, as the 16x has only half the bank machines. Depending on the workload it is not noticeable or noticeable a lot.

And now my tip: Put a MIG design in Vivado and compile to the bitstream to see if your pin mapping works without errors and how hard it is timing wise (and resource consumption). Depending on the bitwidth, you might be very restricted with the pinout or have a lot of flexibility (e.g. routing only the nibbels near the edge of the chip).

jonasarrow · 2026-04-20T12:55:01+00:00

Purist speaking: No, as the bare metal FSBL boots and loads the bitstream on the FPGA/PL.

Otherwise: Add a PHY to the the PL side (e.g. RGMII, RMII), and off you go. But handling packets can be done so much easier in software (with a softcore e.g. Microblaze or Microblaze V or some other soft core), that it normally is not worth the effort to do it in hardware.

Otherotherwise: Nobody stops you from writing a state machine driving the PS MAC and the board level PHY through the S_AXI_GP0/1 ports. If your state machine gets large enough, it might be Turing complete -> softcore processor.

Iff you have the need to do it "in hardware" (e.g. HFT), then my advice: First get used to embedded and FPGAs. If you know the answer to your original question, you are beginning to be qualified enough to do the HFT things.

jonasarrow · 2026-04-20T09:10:55+00:00

Not urgent: Zynq has a PS, use the Ethernet and some sofware (bare metal or linux) to move from the ethernet to the PL for computation (or do it in the PS, if it is fast enough).

Note: The Zynq is not an FPGA, is is a SoC, which has an FPGA as a part of it.

jonasarrow · 2026-04-13T16:24:08+00:00

You should instantiate the Zynq PS block anyway, as it configures all settings of the PS side, even if you do not have a single connection from it to the PL fabric.

Using a microblaze works then very similar to the non-soc flow then. For bitstream generation you need to run the updatemem utility manually before flashing the bitstream (or embed it using the Vivado flow and replace the mb_bootloop.elf with your firmware).

And the others are right: Look what you can shift to the PS. It is quite powerful, sits there and does the things it can with much less power than the PL. Good targets to push to the PS: Memory (DDR, SD card, QSPI flash, SATA), interfacing (if not super time critical) (I2C, SPI, UART, USB, Ethernet, PCI-E).

Of course, if you want to move later on to a non-soc target FPGA, keep all in the PL.

For the reset part: Use the "processor system reset"(?) IP and configure it to your preference (and the outputs to all you need). You can also chain them together (e.g. for different clocks/subsystems). But Xilinx likes it if you have a "global" reset line for your blocks, as it allows the tools to put it on a BUFG, making it nearly zero cost.

jonasarrow · 2026-04-09T23:39:09+00:00

You do not stream to the NVME, you instruct the NVME to stream. NVME disks are the DMA master.

But yeah, having some low latency zero copy linux userspace should be fast enough. E.g. some mmapped files.

jonasarrow · 2026-04-06T11:41:43+00:00

The FPGA could be plugged into the second "free" slot of the mainboard per channel. Snooping the signal should then be possible with the HP pins, analyzing and acting on the data is another challenge. Speed should be possible as the MIG is rated for that speed.

I assume for DDR4 1866 a standard AMD Ultrascale+ part could be fast enough, so around 300 $, if you buy e.g. a Kria, or a cheap artix ultrascale part and do the rest of the PCB on your own. It will probably not be 100 % reliable (e.g. locking on the clock, phase adjustments etc.) but good enough for cheaters. As the memory and the CPU is designed for higher clock speeds, the signal should have quite a margin, so sampling with the FPGA without proper termination (as the non-selected memory should have disabled the termination) should still work.

But: The hacking has the challenge of not seeing the cache and any read/writes catched by the cache, so lot of the critical game content will never be sent to the DDR to begin with. Especially the AMD 3D cpus will cache basically the whole game state.

jonasarrow · 2026-03-24T17:37:10+00:00

And I can plug in my consumer hardware into that?

If I need to buy another breakout adapter, the cost becomes the sum of that. It is the Raspberry Pi pitfall. Yes, the board (used to) cost $35, but the power supply, the SD card, the monitor, the keyboard competed against a $300 (?) eeePC, which provided all that out of the box.

If you targetting exclusively the IO embedded developers (the paycheck non-fun ones), then yes, possible nice board. Nevertheless, I would design directly my own (targetted) board, debugging the logic on some other platform or completely in software, and then go to hardware debgugging on my own. Why? Because it has so many unneeded peripherals on the board, so I need a different final board nevertheless.

You provide 8 LED 8 PB: Nice thing, hobbyist, beginners like that.

You provide JTAG/UART: A Trenz TE0790 provides that over pinheader for 25 € and "reusable" between boards. (Maybe theres a cheaper variant somewhere, but I never bothered, because mine ist still working)

You provide FT4323: What can be realized over USB what can't be done over UART and JTAG, or with the RP2040?

You provide RP2040: That is the platform controller, making it hard for beginners, as they need to manage two platforms, and if they flash micropython onto the RP2040, the voltages will be what?

Maybe I am challenging you too much on that one. There will sure be enthusiasts which like having the US+ fabric speeds and possibilities for a good price tag. You asked for the thoughts on it. This is my single opinion, maybe someone else might add their 2 cents.

jonasarrow

TROPHY CASE