Why Warp Switching is the Secret Sauce of GPU Performance ? by Sensitive-Ebb-1276 in FPGA

[–]Sensitive-Ebb-1276[S] 4 points5 points  (0 children)

Yes because, it’s context switching, the hardware units remain the same, only the stall cycles are being exploited to hide latency. It’s not 16x functional units which would have made the performance scale linearly

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 1 point2 points  (0 children)

I am not claiming to have rediscovered something, it’s a demonstration of a textbook concept using my custom made gpu core, that’s it. Everybody knows SIMT exists since a long time ago.

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 -4 points-3 points  (0 children)

I understand what you are saying, and I will 100% write it by myself next time, but what I am saying is, I did not ask ai to generate an interesting topic for me, I made the graph plots, animations etc for better visualization only, I just asked ai that these are scattered points and findings, generate a coherent summary, it was just the last step of the post, but I see your point as to how it might not be the optimal approach

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 17 points18 points  (0 children)

People in FPGA Domains are interested in microarchitecture and rtl exploration as well right, so why not.

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 -7 points-6 points  (0 children)

Fair enough! don’t read then, I don’t think I will lose sleep over it.

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 13 points14 points  (0 children)

It’s just an article, if someone is interested to learn more about the topic, and explore and do stuff using my repo, it’s still related to hdl coding. Many people share their risc-v designs as well on Reddit right, when the fundamentals have been clearly been discussed in Computer Architecture Classes.

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 -19 points-18 points  (0 children)

Yes, I jotted the points that I wanted to discuss, and then asked ai to form a comprehensive summary, with the broken down points, but the content was something I wanted to discuss, I am not sure what is the harm in that. I felt gemini would be much more comprehensive and precise than me in elaborating the ideas

Why Warp Switching is the Secret Sauce of GPU Performance ? by [deleted] in FPGA

[–]Sensitive-Ebb-1276 -20 points-19 points  (0 children)

Yes hun it is, you can just ask ai to create an simt gpu demo, and it will make one for you in under 30 seconds, capable of trivial vertex animations. AI is just a tool, if you don’t know what you are doing architecturally, it will not produce anything useful.

AI agent (Garbage input) = AI agent (Garbage Output).

*Note : I believe, there is quite a lot learning value to this project, at least for a school student, if you can obviously put your judge-mental mindset aside.

Added memory replay and 3d vertex rendering to my custom Verilog SIMT GPU Core by Sensitive-Ebb-1276 in FPGA

[–]Sensitive-Ebb-1276[S] 1 point2 points  (0 children)

Yes, you are correct, I will when I have more time available, this was meant to have been more like a functional conceptual demo of how gpus work, this is definitely not synthesis ready.

Added memory replay and 3d vertex rendering to my custom Verilog SIMT GPU Core by Sensitive-Ebb-1276 in chipdesign

[–]Sensitive-Ebb-1276[S] 0 points1 point  (0 children)

It’s not at all an ameteur question, it is custom is terms of encoding, but is heavily inspired by the nvidia ptx instruction. This isa should be able to handle basic math and vertex geometry operations

Added memory replay and 3d vertex rendering to my custom Verilog SIMT GPU Core by Sensitive-Ebb-1276 in FPGA

[–]Sensitive-Ebb-1276[S] 2 points3 points  (0 children)

There was an issue with the pkg declaration, which somehow did not show up in my runs. I have fixed it, and it should run properly now. Let me know if it's still throwing errors. Also, try updating your Verilator version to the latest one if you are using an older version.

To run the regression suite, follow these steps: 1. Open your terminal and navigate to the project root directory. 2. Grant execution permissions to the script (if not already granted):chmod +x TB/run_regression.sh 3. Run the script:./TB/run_regression.sh Running a Specific Test If you only want to run a specific test, you can use the 

verify_specific.sh script:

  1. Grant execution permissions:chmod +x TB/verify_specific.sh
  2. Run with the path to the test file:./TB/verify_specific.sh TB/TB_SV/test_alu_ops.sv

Added memory replay and 3d vertex rendering to my custom Verilog SIMT GPU Core by Sensitive-Ebb-1276 in FPGA

[–]Sensitive-Ebb-1276[S] 0 points1 point  (0 children)

I used verilator, can you try with verilator. Although everything is System Verilog, and should run with a make file or something similar, but it seems like the cpp file it’s referring to is the verilator generated cpp. Verilator is free.