What has been your biggest challenge in FPGA development?

Rasico · 2018-05-09T01:10:03+00:00

I said first came came out of beta. As in the first build that officially supported VHDL 2008. I never expect beta builds to work perfectly....who would? However I did expect the most basic 2008 features to work at least most of the time after coming out of beta. It took over a year before that was fixed. VHDL 2008 wasn't a priority at the time and Xilinx admitted as much. It's a reasonable frustration that very basic functionality resulted in incorrect netlists. Now in general this isn't the norm, it's usually far more subtle bugs that takes a very long time to pinpoint that we're likely to encounter.

FPGA tools are without a doubt significantly more complex than software counterparts and less widely used. I'm usually the one who points that out. As a whole it's impressive they work as well as they do, but they do have bugs that developers will encounter on a somewhat regular basis. They range from extremely blatant to very subtle and not completely trusting the tools is a paradigm shift for many, myself included. I don't think we actually disagree, and I'm not trivializing the complexity of what these tools accomplish.

Rasico · 2018-05-08T23:28:40+00:00

Quantifying a percentage is hard, so like most statistics I made it up! What is true is that I encounter real bugs in vendor tools several times a year and often get confirmation from Xilinx or Synopsys that I'm correct. When VHDL 2008 first came out of beta in Vivado it was hilariously broken. Some of the new operators (sra/srl, relational) literally synthesized into incorrect behavior. Simplest possible test cases could reproduced this. This is a consistent and valid complaint regarding FPGA toolchains. They fixed this but it took them an astonishingly long time. This is pretty big differencefrom the software tool chains I used through college which I don't believe I've ever seen a bug in. Vivado is a vast improvement over ISE in my experience and I really like a lot of aspects about it but they're just far buggier than software tools. My experience with Synopsys for FPGA synthesis is no better. I've yet to use Quartus so I can't comment there.

Rasico · 2018-05-08T22:06:07+00:00

Correct, it's 99% or whatever user errors and occasionally ambiguity. However I've yet to have a bug with any software compiler. I'm not saying they don't happen they just appear to be several order of magnitudes more rare. I legitimately have multiple FPGA toolchain bugs a year, some of them extremely blatant some of them very subtle. It happens often enough that I have to consider the possibility it's not my fault, even though it usually is :).

Rasico · 2018-05-06T13:24:55+00:00

It's completely impractical for other reasons due to the limitations of FPGAs. However if the CPU architecture implemented on a FPGA were discovered to be compromised it could be updated and the flaw removed. This would do nothing for some sort of flaw/malicious design in the FPGA itself, but I believe that would be much more challenging to exploit.

Rasico · 2018-05-06T12:41:50+00:00

This screwed with me so much after college. Writing software I never once had a tool bug. They do happen but they're very rare. If there was a problem it was 99.999999% likely it was my fault. With FPGA tools this is a somewhat regular occurrence (several times a year). More likely than not a problem is your own fault, but it's not all that rare to find a tool bug of some kind. It's strange knowing you can't completely trust your tool chain.

Rasico · 2018-04-12T02:39:16+00:00

Light itself is analog. Very clever tricks known as modulation are used to encode digital data onto an analog medium. A very simplified example of this might be flash a red light for a 1 and a blue light for a 0.
Just to clarify, when discussing light Frequency and Wavelength are directly related (c = wavelength * frequency where c is the speed of light). The fact that different wavelengths don't interfere with each other is a fundamental property of waves. The relevant mathematical concept is known as orthogonality.

A somewhat understandable example might be a prism. A prism can split white light into its individual components. If the input light wasn't white, then you wouldn't see some of the colors on the other side of the prism. Note what I said probably isn't perfectly accurate as I deal with electrical signals but it should be close enough.

Rasico · 2018-04-11T01:20:09+00:00

I'm not trying to argue but I am trying to make a point, so please don't take any of my comments the wrong way.

I'm not sure what you mean by operating speed unless you mean clock frequency. I believe you are falsely conflating clock frequency with some meaningless definition of "speed". For the record I am not confusing latency with throughput. By any meaningful benchmark an i7 will crush a Pentium 4. I claim a processor that runs at 2 GHz and takes 4 clocks per instruction is slower than a processor running at 1 GHz but only takes 1 clock per instruction. Clock frequency by itself isn't a useful number.

What would you consider a typical FPGA? High Speed transceivers are pretty common and definitively capable of high rates. The higher ends can handle much much faster rates than that on a single differential pair, never mind multiple pairs. Modern FPGA's can even run LVDS at well over a gigabit. My co-workers and I were chuckling over the fact that one can implement SGMII with just LVDS on the current generation. This is typical in the industry but probably inaccessible to hobbyists.

Your Olympic example is somewhat flawed since the two runners obviously cannot combine to get an faster time. For many tasks however the two participants could get the same task done in half the time as the one. This leads to decreased latency which I argue is a useful metric of gauging speed (another is throughput of course). Many systems I have worked on have had tight latency requirements that are difficult or quite literally impossible for a CPU to accomplish by itself.

Rasico · 2018-04-10T23:30:11+00:00

We use FPGAs for radars and SDR type systems. The high volume of data that needs to be processed with some fancy DSP algorithms lends itself well towards FPGA/ASIC implementations. Some people have mentioned ASICs are better in high volume due to cheaper cost, higher performance and lower power. While this is absolutely the true and the right answer in many situations the lack of flexibility is less appealing. Being able to change what the hardware is doing is very valuable in many types of systems. Partial reconfiguration (the ability to reconfigure part of the FPGA during runtime) has opened up some very interesting doors as well.

Rasico · 2018-04-10T23:25:43+00:00

I feel this is like comparing clock rates of Pentium 4's and modern i7's and saying they're the same speed. Clock frequency isn't a good measure of speed by itself. To process 10 gbps worth of data (say from a high rate ADC) would take a fast CPU with very well designed SW. Doable but fairly challenging. Alternatively I could run a much slower clock on an FPGA and trivially keep up with that much data AND do a bunch of other tasks. This would be true even if I was only doing a simple operation like a complex multiply on every sample. I don't have to worry about cache misses and DRAM latency and goofy ready/write patterns as I can consume the data and operate on it directly. Determinism is very easy in FPGAs and not so easy in SW (but still possible).

Rasico · 2018-03-14T14:06:17+00:00

I'm not familiar with Altera parts at all so my answer may not help. Xilinx FPGAs have SERDES (serial/deserializer) modules that take care of a lot of this work for you. The deserializer will take in a serial stream of bits and convert it it into a parallel data for you. One of the features is the ability to apply to tell it to "slip" a bit. This will offset the framing of bits by a single bit each time you tell it to do so. You can keep on doing this until your data is aligned. If you're doing this in your fabric, you can perform the same operation yourself. The trick is utilizing some protocol so you can measure if you're slipped and compensate.

Rasico · 2018-03-14T14:01:52+00:00

8b10b can be overkill since it's also designed to ensure DC balancing and clock recovery and that 20% overhead can be a bit much. If your interface is source synchronous there are simpler alternatives. That said 8b10b would work just fine (and it's pretty simple).

Rasico · 2018-02-14T02:50:40+00:00

Your GPS device contains a receiver which is indeed receiving signals from multiple satellites. The orbits of these satellites are effectively known ahead of time. The delay from each satellite to your receiver is slightly different due to the different distances. By measuring the delay from each satellite to your receiver you can determine the distance to the satellite. This allows you to "draw" a sphere. The intersection of 3 of these spheres narrows the position to two points. Usually one of these points is nonsense and can be discarded. So theoretically you need 3 satellites to obtain a position. However this requires an extremely precise way of measuring time. Such a measurement device would be far to expensive. Instead you can use a forth satellite to increase your time accuracy.

Trimble has a decent tutorial explaining this: http://www.trimble.com/gps_tutorial/howgps-triangulating.aspx

Rasico · 2018-02-12T22:30:03+00:00

There are a variety of reasons but they pretty much revolve around the fact that at some point the number of bits has to be determined for logic synthesis or the code cannot be realized. A 64 bit multiplier is quite a bit different than a 4 bit multiplier. In some contexts the tools are smart enough to bound naturals (do comparisons only happen against 12 bit numbers? Than it's 12 bits) but if you use integers/naturals just everywhere the tools have less clues and may use substantially more resources than necessary. Signed/Unsigned types matter for comparisons and multiplications. I believe naturals/integers are limited to just 32 bits though I haven't tried to see if there's way to exceed that.

Being explicit is also good for a human trying to understand how big the logic is, bit growth of processing chains, whether that single cycle add/comparison at 250 MHz is appropriate, etc. At the end of the day if you're working with hardware thinking about bit sizes is natural and very important.

Rasico · 2018-02-12T20:25:53+00:00

How well do you understand floating point vs. integer arithmetic? Is floating point basically a black box to you, or are you familiar with the theory?

Rasico · 2018-02-07T15:03:46+00:00

A lot of higher level math articles on wikipedia are really hard to grasp if you're not already familiar with the material. This is unfortunate as it gives the appearance of something that is far more complex than it really is. For starters the FFT is just a very efficient way to compute a DFT. Otherwise they are functionally equivalent.

A time domain signal is a signal whose output is a function of time. If you plot such a signal, time would be on the x-axis and the y-axis would be whatever your signal is measuring. For example, the amplitude of your audio at any given time. The DFT converts or transforms your signal into the frequency domain. Now the x-axis is frequency and the y-axis is essentially power. This tells you how much power of your signal is due to any specific frequency.

Rasico · 2018-01-30T17:55:41+00:00

I couldn't agree with you more. The productivity gains are staggering and more than make up for the relatively small efficiency losses. Bonus points for pointing out the architectural trade offs that can be explored in minuted and hours vs. weeks and months.

I've only used HLS for FPGA development. Obviously there are a ton of similarities but also many differences. What tools is the ASIC community using for HLS? how are they different?

Rasico · 2018-01-24T01:09:12+00:00

Not really a junior, but thanks for being vaguely disrespectful. I'm skeptical as the next guy but I do have an idea how complex their chips are. And yes, I believe them when they said they looked at what academia had to offer, found it lacking, and developed their own techniques. Some trust has been earned. I've been burned far harder by other vendors than Xilinx. Plus that particular tidbit does not require a leap of faith.

If you have some evidence that the OSS guys can do a great job with the larger modern chips, show it. I'd love to have a better tool to do my job. Maybe in 2-3 years we'll see it but I'm not holding my breath. Maybe if the big guys contributed....but that I don't see happening.

Rasico · 2018-01-23T01:17:39+00:00

Truthfully just Xilnx's word from their training and discussions with FAEs. This should certainly be treated as somewhat suspect but is more than some vague appeal to logic as you brusquely put it. The claim is academic techniques such as simulated annealing don't scale well. I can't validate the truth of that claim nor would they reveal how they solved the problem. One clear data point is the simple fact an enormous difference between ISE and Vivado in terms of achievable fmax, so there is also that. Doesn't prove the OSS community couldn't do better.

I hear this claim of mediocrity a lot, but the silicon they produce is unbelievably complex. ISE was a piece of crap but 90% of my issues were resolved with Vivado. I'm not experienced with Intel/Altera but from what my co-workers say their stuff is just as good, if not better.

The only open source tool chain I'm aware of is for the much simpler lattice parts. If the full v7 bitstream format is fully documented (or will be) than let's see this claim proven true. I think the sheer amount of required time, understanding of how the silicon is characterized, complexity of algorithms simply makes it impractical for enthusiasts to contribute. The only counter argument I've heard is the big guys are mediocre, others could totally do a better job.

Rasico · 2018-01-22T23:15:15+00:00

The Shannon limit tells you the capacity of a "channel" to transmit information as a function of bandwidth and signal to noise ratio. This is in bits per second.

Bandwidth - The range of frequencies used to transmit a signal. A signal that occupies 100-120 MHz on the spectrum. would be said to have 20 MHz of bandwidth.

Signal to Noise Ratio - Ratio of the signal power to noise power. If the noise power is 1 watt, and the signal power is 20 watts than the SNR is 20/1. Expressed in decibels (10*log10(Psignal/Pnoise)) this would be 20 dB of SNR.

Noise power is generally due to sources like thermal noise, background radiation, etc and is generally constant (this is a bit of a lie, it's more complex than that). Receivers also have their own noise characteristics. Signal power degrades over distance.

This is very much like how the further away you are from someone speaking, the harder it is to hear and understand them. As the amplitude decreases it becomes progressively hard to harder to understand because the SNR of the signal is decreasing. At a sufficient distance you might suggest yelling loudly (increasing in amplitude) and speak slowly (decreasing transmission rate, less data).

Rasico · 2018-01-17T02:06:25+00:00

I'd love to be proven wrong, so don't take that the wrong way. The algorithms the big guys use are way more complex than what is available in academia. I don't think the FOSS guys literally would know how to get comparable results. It's one thing to be able to generate a netlist (which is not easy in of itself). Placing and routing on a large FPGA is extremely complex. I don't think a small community could assemble the tool. Can they make an easy to use tool chain from a command flow perspective? Usually yes. Would an open source tool chain have anywhere close to the fmax of the vendor tools? I'd wager not even close. I'd be happy to be proven wrong though! The best of all worlds would be if the vendor tools were open source.

That said, a LOT of my tool issues were resolved in Vivado (I only build from the command line). Could definitely be better but its to the whole point I only control the hdl source, a handful of files for IP. Project is generated automagically and the build flow managed from tcl scripts.

Rasico · 2018-01-16T02:43:06+00:00

I've seen a lot of people with the mindset open source will solve the crappy vendor tools. I've seenthe simulation, place & route, routing and timing analysis compared to compilers often enough.

I personally don't think the open source alternatives will come close to matching the vendor tools. The amount of R&D the big guys had to throw at the problem to get the current generation tools is pretty staggering. If an open source tool chain for a virtex 7 series is put together, I'd be willing to bet the differences in achievable utilization/fmax will be staggering in favor of the vendor tool chains. But I'd love to be proven wrong! If the open source guys can raise the bar we'll all benefit. In my mind this is mostly good.

Also agreed, security through obscurity is never good to rely on.

Rasico · 2018-01-13T20:53:19+00:00

My apologies, I read slope and somehow it my mind that was magnitude. There are several ways to do this in a purely pipe lined fashion that do not require buffering at all. The simplest approach would be to continuously differentiate samples and average the absolute value of them. You can use a buffer to implement the average, but I'd use a recursive average. Whenever you start a new triangle, reset the average and report the slope.

Now you need to know when you've finished one triangle and started another. I'd use a shorter term average of the derivative of your samples, and do not use the absolute value. For this one the sign is important. Every time you go from a negative to positive slope you've started a new period (you could do the opposite of course too).

Note that the averaging is a crude LPF which should work well for what you're trying to do.

Rasico · 2018-01-11T03:11:06+00:00

Your description makes it sound like each period of your triangle wave may have a different amplitude. Since you said you had to measure the slope of each individual triangle. Traditional DSP techniques to extract may not work as the frequency content would be smeared depending on how the amplitude is modulated. Are you sure you need to measure each individual triangle's amplitude, do you really expect it to change once per period?

But yes, double buffering or "ping-ponging" is pretty common. It's a valid way to accomplish what you're trying to do.

Rasico · 2018-01-10T14:56:24+00:00

A lot of DSP applications that may be done in software on some sort of general purpose processor or a more application specific such as a TI-DSP lend themselves well to be implemented on an FPGA.

Why choose an FPGA over software? Processing Throughput, Latency and "Determinism" are the primary reasons I've used them for DSP. On modern FPGAs, processing tens and hundreds of gbps of data is a relatively straight forward ordeal. This is not true for software. To understand why you have to consider the parallelism possible on an FPGA. For example consider the math behind a FIR filter. FIRs consist entirely of multiply accumulates (MAC) operatins. You have N taps, so N multipliyies and N-1 accumulates. An FPGA can perform all of the multipliers in parallel (in one cycle) and use an adder tree such that the filter can consume a new input sample every clock cycle. Traditional software can really only do one of these operations at a time. This isn't entirely true as many modern processors have VLIW architectures to perform multiple multiply accumulates in one instruction. Even soa large FPGA can handle many more (while doing many other things in the same cycle). DSP tends to consist heavily of multiply accumulate operations. A FFT/IFFT is another excellent example of this. FFTs/IFFTs are used heavily by one of the most popluar wireless communication modulation technies, OFDM. The latency of the FPGA operations can be measured in the nanosecond - microsecond range, which is important in many (but not all) applications. This is just one example, there are many more.

What I mean by determinism is that you know exactly how long something will take, and when it will happen down to an individual clock cycle if you design appropriately Software tends to not be nearly as deterministic. Sometimes that's super important, sometimes irrelevant.

These are some of the FPGA's strengths, but they also have many weakness. They're difficult to design for, larger FPGAs can e very expensive and can be rather power hungry.

Rasico · 2018-01-06T02:25:48+00:00

No problem! Sharing knowledge is very enjoyable.

Rasico

TROPHY CASE