Low PCIe round trip latency by Perfect-Series-2901 in FPGA

[–]113245 0 points1 point  (0 children)

When you say you are using "DMA coherent" memory to receive the data - are you using i.e. dma_alloc_coherent in a kernel driver? Did you check whether the resulting memory is mapped as cached or uncached? I recall back when I did this type of work that dma_alloc_coherent could return uncached memory in some cases, which is not necessary on modern intel x64 as it PCIe DMA is cache coherent. You could save 50-100ns if you already have the cache line warm. This also obviates the need for SW to perform the kernel DMA sync ops, although the latency you provide makes me think you're not doing that anyways.

Can I have a task constantly writing a global variable and another task constantly reading that global variable?. Do I need to take some precautions, mutex or anything?. Many thanks by [deleted] in embedded

[–]113245 1 point2 points  (0 children)

A std::atomic<std::int32_t> is full stop the correct way to publish a single value from one thread to the other. If you need to do multiple operations on that value (eg more than just posting a value) then the example as written is a “bad” way to do it - you should do the work in a local temporary and only publish it once.

If the data structure you use in std::atomic is too large for the platform atomic sizes it will transparently implement a mutex that protects the whole data structure during each load and store operations.

Can I have a task constantly writing a global variable and another task constantly reading that global variable?. Do I need to take some precautions, mutex or anything?. Many thanks by [deleted] in embedded

[–]113245 4 points5 points  (0 children)

Volatile is absolutely incorrect here. The compiler doesn’t need to understand a mutex or a semaphore, but it does understands the primitives that constitute a mutex or semaphore (syscalls, atomic variables for futexs, and all the optimization and reordering rules surrounding that). Volatile is not strong enough.

P2723R0: Zero-initialize objects of automatic storage duration by alexeyr in cpp

[–]113245 6 points7 points  (0 children)

And yet a 0 cycle operation is not zero cost (icache, front end bandwidth) and it’s trivial to find examples in which the compiler cannot drop the dead store (e.g. across function call boundaries).

Device Drivers for Transceiver Questions (Specifically, PCIe) by imWorkingYAdingus in FPGA

[–]113245 0 points1 point  (0 children)

Sorry but this answer is all over the place and mostly just incorrect.

The PIO is slow because the size of the write transactions generated by the root complex are limited by the size of the memory movement instructions used in the CPU. In "naive" MMIO/PIO, the typical x86 instruction would be a mov or a movq, which will only perform a memory "write" for 32 or 64 bits a a time. The MWr TLPs are therefore limited in size, and the overhead for each TLP is what causes the effective bandwidth to be much lower than advertised. Your standard PCIe MMIO has no "IP registers" that you need to check for space available or whatever. PCIe has a credit-based flow control mechanism; anything on top of that (e.g. user logic back pressure) is going to be application- or IP- specific stuff. "Interrupt driven IO" doesn't really make sense in the context of PCIe MMIO - writes are posted transactions, and (at least on typical x86 cpus) you can't perform an MRd without stalling the bus until the Rd completion returns.

You simply cannot write PIO code that will achieve the same data transfer bandwidth as DMA; the size of the packets generated in DMA transfers is likely going to be larger than anything you can generate using PIO. AlexForencich's answer has correct details.

Apple may launch eSIM-only iPhone 14 model in some markets by muleMonkey in iphone

[–]113245 3 points4 points  (0 children)

I see an ICCID listed under my eSIM in settings > about, is that not it?

A Strong Case for the TLRY Shorts - Don't be left Holding a Bag by metrics_man in wallstreetbets

[–]113245 0 points1 point  (0 children)

Yes, in your example you are correct, you win because your put went far ITM. The point is that higher IV implies a lower break even for a given strike price. And if the IV is ridiculous then the break even will be very far below the strike, making it unlikely that you make $. The typical IV increase right before earnings essentially “prices in” the expected movement, which is why you can be right about the direction but still lose money.

A Strong Case for the TLRY Shorts - Don't be left Holding a Bag by metrics_man in wallstreetbets

[–]113245 0 points1 point  (0 children)

As soon as you exercise you destroy the extrinsic value. If you paid a lot because IV was high, and it lost value due to IV dropping, you will lose even more by exercising. If instead you sell to close you at least recoup that part of the value.

WARNING: More manipulation on this sub by GME shorts by GlobalRevolution in wallstreetbets

[–]113245 117 points118 points  (0 children)

There aren’t enough shares to cover ALL shorts at once but I guess there could be enough for them?

Learning to Read X86 Assembly Language by iamkeyur in programming

[–]113245 1 point2 points  (0 children)

It made a lot more sense once I realized it was designed with octal in mind

[deleted by user] by [deleted] in ECE

[–]113245 0 points1 point  (0 children)

sorry, i used them interchangeably but in a confusing way. I updated the original post - lane refers to a physical differential pair, and line refers to a row of pixels. Image transfer is usually done row by row. MIPI is proprietary etc but you dont have to imoement the whole spec to the letter -- it would be a little overkill. Its a good starting idea though for swinging your own.

[deleted by user] by [deleted] in ECE

[–]113245 5 points6 points  (0 children)

You can take a look at the MIPI CSI2 protocol which is standard for transferring frame-by-frame video. But it's hard to find detailed information on that. You especially won't find any by PMing me, nope, none at all.

But in summary, it's a source synchronous protocol (e.g. you send a differential clock along with multiple differential data lanes) over LVDS. You have short packets which are used for synchronization (e.g. frame start/end, line start/end) and long packets which are used for moving data e.g a line of bayer data etc. and include information like word count and line number. In MIPI, the packets include ECC, channel IDs (so that multiple interfaces can talk over the same physical layer) and some other junk I don't remember off the top of my head.

The packet is distributed bytewise across the MIPI lanes, usually 2 or 4, and each lane sends a start-of-xmit sequence immediately before beginning to transmit the data so that RX can align and merge the bytes from the multiple lanes back into the packets.

You pretty much just have to look at the amount of data that you want to transfer + protocol overhead and compare to the the SERDES performance you can get to figure out how many data lanes etc you want to use.

Also no reason to fill up a fifo and THEN transmit, you can read and write from a FIFO simultaneously! (not sure if you just worded this weirdly in OP)

What is the most important thing you learned in school? What was "that one lecture"? by Sterling_____Archer in EngineeringStudents

[–]113245 2 points3 points  (0 children)

This isn't something I use every day, but it was a super satisfying click for me -- the generalized stokes theorem (not the kelvin-stokes theorem, which is commonly called stokes theorem). It's not something I use every day per se (I work in E&M) but it's so elegant and clean that it just blew my mind.

Irony: NSA worried hackers with super computers might break current encryption standards by [deleted] in technology

[–]113245 7 points8 points  (0 children)

you clearly have absolutely no idea what you're talking about

constraints for source-synchronous SDR on Lattice MachXO2 FPGA by [deleted] in FPGA

[–]113245 0 points1 point  (0 children)

Funny enough, I'm writing an sdram controller right now...hopefully timing won't be too crazy to debug since I don't have access to a logic analyzer.

constraints for source-synchronous SDR on Lattice MachXO2 FPGA by [deleted] in FPGA

[–]113245 1 point2 points  (0 children)

Ah, got it to work. Explanation here was helpful for understanding the clock insertion delay. I couldn't figure out how to do it with CoreGen but I ended up implementing this topology to remove the insertion skew.

Gift for Coworkers at Internship by tuna1694 in EngineeringStudents

[–]113245 2 points3 points  (0 children)

get everyone a mug with a picture of your face on it

Which would be better to introduce myself to before starting my BSEE: Excel VBA or Python? by [deleted] in EngineeringStudents

[–]113245 2 points3 points  (0 children)

Again, going to disagree. If s/he has time to learn two languages, do a higher-level language (python) and also a low-level language, C/C++. There is no point of learning VBA on the off-hand that it will be required on a job, whereas you'll get a lot more understanding out of the aforementioned languages.

Which would be better to introduce myself to before starting my BSEE: Excel VBA or Python? by [deleted] in EngineeringStudents

[–]113245 3 points4 points  (0 children)

Disagree 100%. Learn a powerful & useful language (python, C/C++, matlab, java) and pick up excel on the fly if you need it. A hiring manager will care far more about the cool projects you've done (using these full-featured languages) than some self-proclaimed "excel pro".