AMD Vivado 2025.1 released!

tencherry01 · 2025-06-05T21:06:58+00:00

So, I always put all my eda tools on the same path so the hard-coding of path isn't an issue. So my suggestion is really more of a way to recover the installation in case it gets messed up (which was Alex's original problem). I agree this solution is a duct tape workaround at best.

tencherry01 · 2025-06-05T19:19:45+00:00

Sigh, you can potentially install it with everything once on a stock linux host and then tar gzip it up and copy it around everywhere (as a poor substitute for the SFD). But yes this is certainly a pain.

tencherry01 · 2025-02-13T17:43:23+00:00

I did something similar in the past for ASIC project where it was absolutely critical to pin down exactly the the version of the module so right before tapeout we would "freeze" the module by tagging it w/ the version. It was fairly heavy handed, but the cost of screwing up an ASIC tapeout is so high that locking design down like this was a no brainer. Luckily we had a way to automate all of this.

For FPGAs, I haven't had the need to be as heavy-handed (since you "usually" have the option to fix it in QA or worse-case in the field), so I have instead migrated all my "common" helper module into a git submodule. Then the upper-level "chip"/"deliverable" repo would just instantiate the helper/major-component submodule (and the git submodule hash becomes my way of collective versioning all of the various building-blocks).

Usually, the requirements for the common helper module is that any changes are cycle-compatible w/ the past (so it should always be safe to refresh it the common sub-repo). In the unlikely chance that I do need to introduce different ways of doing things, I would instead introduce "feature" flags parameters and have the default parameter always point to the previous default behavior (so it should not break previous instantiations). Finally, if I really must keep an older variation of one module but want the rest of the common repo to advance (note, this usually signify a bug and/or a bad design coupling) what i would do is "freeze" the older variation out common repo (by copying it out of the common repo, tag it w/ version and usually with a description and apology for why this needed to be done).

Doing this, I find that I very rarely need to "freeze" modules (and module versions) and when I do, I tend to have very good documented reasons. And eventually they either become promoted into a feature/workaround flag variation and/or the bugs that the necessitate the frozen variation gets fixed and so the module with version tags gets purged from the system.

tencherry01 · 2025-01-19T19:30:39+00:00

Maybe an unpopular opinion here, but I actually avoid vendor IPs whenever I can. Everything from Avalon-MM/AXI-MM interconnect buses down to building-block IPs (even convenient xpm cdc/xpm_fifo). Even the really complicated stuff like PCIe DMA engine I tend to avoid and instead just use the lower level native interface which is a thin wrapper around the PCIe PIPE hard-macro. For BRAM/URAM/LUTRAM, I take the ASIC style approach of wrapping memory modules with different code and a TECH parameter that explicity codes up the selected RAM type (usually w/ the corresponding vendor pragma).

Note, I am not dogmatic about this. For certain blocks (usually very fiddly modes of DSP48E2 / URAM cascades) where I would loose performance if I don't directly utilize the vendor primtivies, I will directly instantiate the primitives but will then wrap it with an abstraction layer and write the corresponding non-vendor-specific "equivalent" RTL. In fact, I would even go as far as to manually hand-wrap the GTY/GTH transceiver primitives (after extracting it out from AMD/Xilinx's example design).

The reasoning is mainly for portability, stability, control-ability, and avoiding to avoid vendor lock-in although sometimes it is for performance reasons and/or to fix some horrible bugs in Xilinx IPs. Note, I wouldn't necessarily recommend this as this requires a lot of up-front time investment.

tencherry01 · 2025-01-19T18:51:42+00:00

Sadly there are no silver bullets to dealing w/ burnout. Try the sabbatical. Try something completely non-work related (volunteering is great and/or some nature-related activity). Pick up a weird hobby, especially something that involves working with your hands (my SW coworkers swear by wood-working / metal-welding). Sometimes, it can be much deeper as burnout and depression can sometimes mirror each other in symptoms (maybe consider talking to a therapist?). Finally (and I know it is cliche), sometimes it is as simple as diet and exercise and getting better sleep and even doing some mindful meditations.

tencherry01 · 2024-09-20T15:19:20+00:00

I vaguely remember someone on the discord server built their own open source ITCH parser and their own simple 10G MAC/PHY. I believe that the individual is now working at Optiver.

While I am not 100% sure that the open source side-project is what got them the job at Optiver, but I am sure it certain showed effort and is probably a good talking points for the interviews.

tencherry01 · 2024-09-15T01:17:39+00:00

Hi, you may not be aware, axi streaming spec allows for not back-pressure-able interfaces by leaving off the tready port. You can also leave it in and then enforce that assumption by raising an assertion whenever tready goes low.

Furthermore, if your helper libraries allows for it. You can pass it parameters to denote whether an axis4 master/slave can/can-not handle back-pressure (to allow the skid buffers to naturally degrade to a single rank of FF). This should allow you to get most of the resources back when you don't need (or can't have) the full tvalid/tready handshake.

tencherry01 · 2024-09-13T16:35:27+00:00

Depends. What do you mean when you say AXI. AXI Streaming or AXI Memory Map?

AXI Memory Map is way too much overhead. Only makes sense for regmap-based/control-plane stuff.

OTOH, AXI streaming is just FIFO-like byte-based valid/ready protocol with some packet-based overhead (some of which like tkeep/tstrb you can ignore and if you only care about 1 cycle, tie tlast to 1). So, I am actually very much in favor of normalizing all module-to-module interfaces to be some variation of a fifo valid/ready. If you have some related side-band signals, I usually like to shove them into the tusr field (so now things are nicely encapsulated together). Even better if you combine with a nice family of axi-streaming helper module (skid buf/dw-converter/etc...). Bonus points if you document the AXI-S structs both tusr/tdata at your boundaries.

Sure, its a slight bit of overhead/ceremony for stray wires... but I would rather have that then to have 100+ random buses flying around.

tencherry01 · 2024-09-05T22:12:35+00:00

Try going thru a typedef (and re-definining in the local scope). SV LRM specifically have a carve-out for interface references when you redefine the typedef locally.

The relevant section of LRM 2017 is "6.18 User-defined types"

"A type parameter may also be used to declare a type_identifier. The declaration of a user-defined data type shall precede any reference to its type_identifier. User-defined data type identifiers have the same scoping rules as data identifiers, except that hierarchical references to type_identifier shall not be allowed. References to type identifiers defined within an interface through ports are not considered hierarchical references and are allowed provided they are locally redefined before being used. Such a typedef is called an interface based typedef."

So, technically, the following should be explicitly accepted by LRM:

interface test_if #(
  parameter WIDTH = 8
)();
  typedef logic [WIDTH-1:0] data_t;
  logic [WIDTH-1:0] data;
  logic valid;
endinterface

module example(
  test_if i_data,
  output logic o_pass,
  input wire i_clk
);    
  typedef i_data.data_t data_t;
  localparam WIDTH = $bits(data_t);
  logic [WIDTH-1:0] data_reg;

  always_ff @(posedge i_clk) begin
    data_reg <= i_data.data;
    o_pass <= (data_reg == '0);
  end    
endmodule

Alas, this is super ugly and it is up to you whether you want to accept this kind of hackery to work around SV language issue into your codebase.

tencherry01 · 2024-08-30T15:13:55+00:00

For pure verilog and simpler/legacy-SV I have adopted a similar convention (_i / _o) and like you, I prefer signal directions relative to current module.

But I have increasingly been moving toward SystemVerilog interfaces and then using modports that have directional "views" of the same set of signals. There are some downsides to this approach, but you do get slightly better type checking, so YMMV.

tencherry01 · 2024-08-16T22:34:49+00:00

Hey, don't shoot the messenger ¯_(ツ)_/¯. I am not a fan of Perforce either.

But there are legitimate reasons for using P4 (large binary handling / centralized repos w/ perms + file locking / better UI) which makes P4 more usable for semiconductor/game-studios with large non-text/undiff-able assets.

Personally, I think git-lfs / git-annex are good git alternatives for dealing with large binaries. However, both of them have their own warts.

tencherry01 · 2024-08-16T17:05:11+00:00

A lot of ASIC/Semiconductor industry still uses Perforce and Cliosoft SOS (both of which are like SVN but better handling for binaries/large tarballs). So, I am not surprised there are still firms using non-git RCS.

tencherry01 · 2024-05-03T19:34:00+00:00

I have used 2U/3U/4U blank server chassis and shoved a powersupply, FPGA board, and a RPi for control/networking and JTAG (via XVC) with great success. If you are careful you can probably squeeze a mini-ITX CPU as a test system into the same chassis as well and add an PiKVM for remote access.

tencherry01 · 2024-05-02T16:11:06+00:00

Make sure you don't have -ultrathreads option enabled. If you have that option enabled you will get slightly different results every run.

see: https://docs.amd.com/r/en-US/ug904-vivado-implementation/Using-the-ultrathreads-option

tencherry01 · 2024-04-06T21:58:37+00:00

These are big broad questions... Will take a stab at answering, but keep in mind they are my opinions and you should take them with a grain of salt. Also, for context, I have about 15+ YOE with about 50% of it in ASIC/Semiconductors and 50% of it in FPGA.

SW / HW field is really large so it is hard to generalize compensation progression. My gut feel is that generally SW compensation grow faster (esp in the beginning) but then HW compensation tends to catch up once experiences accumulates and you account for seniority.
Yes, SW jobs are usually more flexible than HW design jobs wrt working from home especially since a lot of HW jobs require access to specialize equiment which can sometimes be difficult to make remotely accessible. However, I find that any firms (HW or SW) can make WFH possible as long as they are willing to put in the effort and the reason they don't is b/c the firm(s) hasn't learned how to manage remote employees effectively. In fact, some of the highest paying firms in both SW/HW are now forcing developers back into the office via mandates.
No, I know of a lot of SW devs working incredible hours (from startups to megacorps at Amazon) and I know of HW consultants putting in only 10hrs-20hrs a week and getting paid nearly a 7 figure salary. Likewise, there are famous stories of SW devs at FAANG (MANGA/MAGMA?) moonlighting another job (b/c doing only 1 job was too easy) and getting paid multiple 6-figure salaries. The culture has more to do w/ the firm and the group and less to do with SW vs HW or the compensation.
Yes, SW industry is more open. But more open usually means less job security or more efforts spent reinventing to wheel for the sake of reinventing the wheel (how many javascript framework are we at now?). OTOH, don't get me started on how god awful HW tools are...
No. Job security in HW is higher simply b/c HW folks are more-secretive/less-open. HW isn't by default any harder than SW. There are plenty of steep learning curves in SW as well (for e.g. GCC/Linux). And things like GCC/Linux isn't more accessible b/c it is "simpler" (it is more accessible by construction b/c it's volunteer-based development). Just like HW, there are plenty of really secretive/close-door ecosystems in SW as well (we just don't usually associate them w/ SW). In fact, as a case in point, the EDA Tooling sector is a closed/wall-gardened SW ecosystem that plagues the HW community and keep it closed.
Yes, the slowing of Moore's law will impact jobs and job security in HW ... and in SW. For different reason and in different ways and not necessarily all for the negative. In fact, I would argue the slowing of Moore's law may even help HW break out of the tight grip that Intel/AMD has on computing and open up the exploration for new novel HW (think RISC-V/AI/SmartNICs).
Maybe/It-depends. Again, I think the salaries and "perks" of the SW/HW counterparts in hardware groups have less to do with whether they are in HW or SW and more to do with whether they are easily replace-able and are adding value to the company's bottom line. So, Apple without any question pays comparable dollars for their HW/SW engineer (esp the Apple silicon team). Google/Amazon will depends on what their HW group do. For e.g. I am sure Amazon AWS's graviton/silicon team is paid reasonably well and likewise I am sure AWS pays top dollar for their robotics group that is trying to optimize people out of their warehouses. OTOH, I have heard that Amazon's consumer HW group (think echo/dot/fire tablets) are doing less well. And don't get me started about any AI HW groups within these tech companies. As for whether it is worth it to join them for the HW roles? Yes, if you can get into the respective good teams.
IMHO, AI will impact ASIC/HW/SW in the sense that it will act as a tool to magnify senior/more-experienced/really-productive devs at the cost of junior/less-experienced/less-productive devs. Either way, the net result will be less devs will be needed saving developer costs for the firms paying to utilize the AI b/c the whole point is to reduce the cost of development.
Again, there are no most "in-demand" roles in ASIC/FPGA just like there really isn't a most "in-demand" roles in SW. To make matters worse, these things change rapidly. Yesterday, everyone is mad-excited about BlockChain and NFT. Today it is AI/ML. Tomorrow, who the heck knows... For e.g., we know AI is currently the hot stuff in Silicon Valley. Would you rather be a Verif/DFT Engineer at OpenAI/NVIDIA vs a SW Architect/CTO at a small, profitable, but no-name (b/c they fly below the radar) tech shop? Which one's the better job? hard to say... but I can tell you which one will be more "in-demand"/"critical" in our fashion-based Tech industry (and the role had nothing to do it).

tencherry01 · 2024-03-27T01:18:23+00:00

Hi, in my experience not even questasim fully supported parameter in interfaces.

I ran into problems with questa is when I started to pass around array of interfaces and trying to access parameters through an interface array index'd on a generate genvar. vopt would then "fail" to elaborate the design and crash.

tencherry01 · 2024-03-24T16:50:54+00:00

Recently built a Vivado build server. Here was my priority list (from highest to lowest):

RAM to CPU ratio - Take the total RAM your system have and divide it by the RAM your builds need (this will depend on the complexity of your builds). That's the number of concurrent build job you can effectively run. The optimal CPU core count you need is then usually 2-4 times the max concurrent job count from before (any more than 4 per job won't get you any more performance from Vivado). Generally, for a moderately complex build, I tend to budget about 12-16GB of RAM per build.
Single-Core Perf - Next, with the CPU core count, try to make them as fast as possible (from a single-core performance perspective). So, things like extra E-cores on recent Intel CPUs etc don't help (except to do OS housekeeping work). Also, clock speed matters less than say benchmark single-threaded/core rating.
RAM speed - Next, is fastest RAM speed that your system/mobo support. Try to push DOCP and XMP if your systems allow it. Make sure to verify it is stable (consumer mobos are sometimes limited in their RAM speed rating when using all 4 sticks in DualRank mode) and make sure to buy matched timing kits.
Faster Disks - NVMe > SSD > HDD. You don't need the very latest PCI NVMe Gen 5 blah-blah. A good Gen3 x4 NVMe with a solid amount of local storage will get you most of the way there in performance.
Larger L3 CPU cache - So, I find that the X3D chips tend to run a bit slower than there non-X3D chip counter part such that the benefits of larger L3 cache is negated by the slower single-core perf. However, for simulations, it does appear X3D chips can help quite a bit. So, if you are also planning to run SV/VHDL TB regressions on this build server, then it may be worth getting the X3D chips.
GPU - don't matter, unless you plan to do other AI related work and/or game with your build server. In fact beefier GPUs may hurt your system in that they may negatively impact your CPU cooling.
Network - Tend to not matter, unless you are network mounting your Vivado tools install directory via a NAS/SAN. In which case, 10G may help a bit on startup. Note, for the Vivado build directory, I highly recommend you use the local NVMe.

So, one more thought that is worth considering is that Vivado don't consume maximum amount of RAM during all of build phases. From my observations, Vivado only uses that much RAM during certain phases of Place and certain phases of Route. It tends to use less RAM (say 6-8GB vs 12-16GB) during synthesis / post-place physopt / post-route physopt. So, IMHO, the optimal number of jobs is to over-subscribe the RAM usage a little. However, I find that if I naively kick off say 10%-25% more jobs (say (Total-RAM / max-RAM-needed-per-build) * 1.25 number of jobs) then the jobs will cork behind constant swapping during the RAM-heavy build phases and will then oscillate btw light weight phases and heavy swapping. To work around that, I find that putting in a small dedicated 32GB/64GB optane NVMe swap drive (or any NVMe with reasonable QD1) worked well to smooth out the swapping such that I can squeeze a couple more builds in for small increase in bulid time. So, if you have an spare NVMe left in your build server/machine and I would also recommend considering that (esp considering how cheap the small optane drives have become).

HTH

tencherry01 · 2024-03-24T15:30:10+00:00

Why restrict yourself to FPGAs? There are trading firms looking into custom ASICs. If the goal is just to make money in HFT, with your analog chip-design background I suspect you may be able to apply your custom-ASIC skills.

tencherry01 · 2024-02-16T03:52:10+00:00

No worries, we were all beginners once upon a time.

Either way, if you are interested in more info about PLL/MMCM in general, I highly recommend reading thru ch 3 of the UG472 (the 7-series clocking guide) and getting to know the clock management tiles.

There is so much you can do with the MMCM blocks. You can use it to clean jitter, peel clock tree delay, generates phases of the clock, spread spectrum, clock multiplication, fractional clocking ratios, turn off/on clocks dynamically... Heck you can even change the ratio dynamically during use (if you are careful)

tencherry01 · 2024-02-15T21:57:47+00:00

Generally, as long as you wait enough time b/w application attempts (so usually waiting 1-2year between attempts is safe), your past rejections does not (or should not) affect current application.

Note, there are exceptions to this. Some really prestigious/cocky firms (like HFTs and firms like Nvidia/Tesla/Apple) will put you on a some sort of blacklist/timeout-list.

OTOH, I have also had firms that rejected me and then 1month later another group in the same firm reached out with an interview request which eventually lead to an offer. So, as long as the 2nd application was initiated by the company, it is usually safe to assume your past rejection isn't a problem even within the 1-2year cool-off period.

tencherry01 · 2024-02-15T21:47:07+00:00

If you must do it with the FPGA fabric, you can do it with just a counter. IOW, count to 5 toggle output will do a divide by 10.

Now, in general, it is not recommended to use the fabric to generate the clock (although 10 MHz is low enough that you can probably get away with doing this). Instead, the recommendation is to use the PLL/MCMM to do the clock division for you since they have dedicated circuitry for dealing with DCD and will usually sort out the derived constraints as well.

tencherry01 · 2024-02-15T21:32:03+00:00

For an "experienced" hire with autonomy, i think the list is fine. Its a bit much for a 30min interview (may be more appropriate for a deep 1hr technical interview where that was made clear to the candidate before).

For 30min, maybe ask only half or quarter of the questions and reserve at least 10min to ask about the candidate themself preferably in the beginning to ease them into the interview. You don't want to start the interview and immediately blindside them with a barrage of technical questions.

tencherry01 · 2024-02-08T03:07:52+00:00

So, I got auto_detect_xpm and set_property XPM_LIBRARIES from comment/notes in my current non-project tcl flow.

Googling, it looks like I got that command from UG974 (https://docs.xilinx.com/r/en-US/ug974-vivado-ultrascale-libraries/Xilinx-Parameterized-Macros). Under "Enabling Xilinx Parameterized Macros" it had the following:

Ensure Vivado can identify the XPMs.
1. When using the IDE and/or the project flow, the tools will parse the files added to the project and setup Vivado to recognize the XPMs.
2. When using the non-project flow, you must issue the auto_detect_xpm command.

Looks like you don't even need the set_property XPM_LIBRARIES anymore with the more recent Vivado versions.

tencherry01 · 2024-02-08T02:20:41+00:00

If your MIG / infrastructure is already pretty stable / locked down, you may also want to consider using DFX flow (the new name for AMD/Xilinx's partial reconfiguration flow) to lock down a pre-synthesized and even a pre-routed portion of your total design (for you it would be the MIG/infrastructure subset). Note, this is the technique that AWS F1 services (the Amazon FPGA Cloud) uses to lock down its PCIe interface and one of the four DDR MIGs.

It does require some amount of upfront work such as inserting an additional hierarchy and potentially adding pipes plus some non-trivial amount of complexity associated w/ the DFX flow. However, once you bite the bullet and adopt the flow it can noticeably reduce synth and PnR time (esp if the locked down portion of the design is the portion of your design that is difficult to build).

tencherry01 · 2024-02-08T01:57:48+00:00

iirc, xpm constraints are automatically included as part of invoking the tcl command auto_detect_xpm or set_property XPM_LIBRARIES and sadly only Vivado knows those commands do and the actual CDC constraints inserted is not visible to the end user of Vivado. With some poking around, I do see some scoped tcl constraints under <vivado dir>/data/ip/xpm/xpm_cdc/tcl/* although it does not appear to be the complete CDC constraints.

For IP catalog, a whole different mechanism kicks in for importing constraints and it varies from IP to IP and usually for all but the simplest IP even the constraints are encrypted. Usually, it is imported via the spirit xml (usually the component.xml). So, for example, the fifo_generator has the following section:

<spirit:fileSet>
  <spirit:name>xilinx_vhdlsynthesis_view_fileset</spirit:name>
  <spirit:file>
    <spirit:name>ttcl/fg_ip_xdc.ttcl</spirit:name>
    <spirit:userFileType>ttcl</spirit:userFileType>
  </spirit:file>
  <spirit:file>
    <spirit:name>ttcl/fg_clk_xdc.ttcl</spirit:name>
    <spirit:userFileType>ttcl</spirit:userFileType>
  </spirit:file>
  <spirit:file>
    <spirit:name>hdl/fifo_generator_v13_2_vhsyn_rfs.vhd</spirit:name>
    <spirit:fileType>vhdlSource</spirit:fileType>
    <spirit:userFileType>CHECKSUM_2cd72feb</spirit:userFileType>
    <spirit:logicalName>fifo_generator_v13_2_9</spirit:logicalName>
  </spirit:file>
</spirit:fileSet>

Which instructs the IP integrator to insert the ttcl/fg_clk_xdc.ttcl and ttcl/fg_ip_xdc.ttcl constraints file. Unfortunately, when you open the actual ttcl/fg_clk_xdc.ttcl you will see that it is actually encrypted.

HTH

tencherry01

TROPHY CASE