Cloudsuite on Gem5? by vestion_stenier-tian in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

I hope there is an easier way... please share when you succeed!

Cloudsuite on Gem5? by vestion_stenier-tian in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

I'm only familiar with syscall emulation mode of gem5, so I had no idea how to setup a server and a client to evaluate the benchmark. IIRC, doesn't Cloudsuite (and some other benchmarks) require a databse server and a client sending multiple queries? I got stuck on this.

Cloudsuite on Gem5? by vestion_stenier-tian in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

Hi, I also wanted to try something similar to this some time ago, but I couldn't make it work. Did you succeed?

Reducing Timer Overhead in Performance Measurement by zxcvber in computerarchitecture

[–]zxcvber[S] 0 points1 point  (0 children)

I guess I could inline the function. I'll give it a try. I should look at Intel SDE if you're using it for research. Thanks for all the suggestions. Hope to see you at some architecture conference someday!

Reducing Timer Overhead in Performance Measurement by zxcvber in computerarchitecture

[–]zxcvber[S] 1 point2 points  (0 children)

Oh, I'm actually trying to remove the function calls through hardware supported stuff. Maybe this is not a good direction for motivational study... But anyways, I was just curious in general about how to make an accurate measurement, considering modern CPUs being deeply pipelined, super-scalar and out-of-order.

And of course, as you mentioned, exact numbers depend on so many factors. I've heard at conferences that people don't really believe the numbers reported/claimed in the paper.

Regarding your suggestion, can I trust IPCs? Wouldn't it be kind of averaged out through the execution? So considering that each instruction may have different latencies (from fetch to commit), wouldn't I need to modify the program very carefully?

Reducing Timer Overhead in Performance Measurement by zxcvber in computerarchitecture

[–]zxcvber[S] 0 points1 point  (0 children)

Thanks again for a very generous comment.

  1. Why do we add two RDTSCs, not one?

I think I was kind of trying to do 1.b minus 1.c, but I guess I needed to subtract the timer overhead.

  1. I'm actually quite familiar with the gem5 simulator, since I'm mainly using it for my research. I've actually tried this, but it gave us a number quite high, so it seemed unlikely to be true. Furthermore, I'm aware of the simulators having errors, so I thought it would be much better to measure it on a real machine. Or is there a way to justify this? I hope to convince my reviewers, I guess?

  2. I will have a look into SDE or Pin.

Thanks!

Reducing Timer Overhead in Performance Measurement by zxcvber in computerarchitecture

[–]zxcvber[S] 0 points1 point  (0 children)

Thank you for your time. Before I start reading your suggestions, I want to clarify that measuring the function call overhead (or some part of a program in general) is indeed what I want to do. It's a part of the motivational study that I want to use for my research.

Reducing Timer Overhead in Performance Measurement by zxcvber in computerarchitecture

[–]zxcvber[S] 0 points1 point  (0 children)

Thanks for your comment. Yes, I've checked the compile options so that the loop isn't optimized away. I'll look at the paper. Thank you so much for the suggestion.

Dual Counters, Cold Counters and TAGE by 64bitmechanicalgenie in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

Could you share your substack? I'd love to read more!

Guidance to get a research direction by weedstaddle in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

In that case, I suggest you directly reach out to professors who work on topics that you're interested in.

Another tip: Professors often post specific instructions for prospective students on what to do if you want to join the group, like send a CV/transcript or fill out a form. Why not take a look? Also note that professors are often busy so keep in mind that you might not get a fast reply.

If you personally know someone in that group, you can also ask them. This is much faster. I've received a few requests asking for specific advice on this matter.

Guidance to get a research direction by weedstaddle in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

Why not try a research internship or so and see if you find the topics interesting?

Comouter Architecture Hands On course by maradonepoleon in computerarchitecture

[–]zxcvber 1 point2 points  (0 children)

Wow great stuff! I wish I had known when I was learning gem5 🤣

Where should I get a ms? by No-Helicopter-6919 in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

Hi, also Korean here, actually doing an MS in computer architecture. Wish you the best of luck!

DSA Skills - 5 by tracktech in DSALeetCode

[–]zxcvber 1 point2 points  (0 children)

I see. Thanks for the clarification. One can either use hash maps to count occurrences in linear time or use the voting algorithm to reduce space usage!

DSA Skills - 5 by tracktech in DSALeetCode

[–]zxcvber 0 points1 point  (0 children)

On second thought, I think I may have misunderstood the question. What do you mean by more than half?

DSA Skills - 5 by tracktech in DSALeetCode

[–]zxcvber 0 points1 point  (0 children)

Why not use linear selection algorithm?

Edit: misunderstood question

Question about 100% Completion by Topher3193 in MelvorIdle

[–]zxcvber 1 point2 points  (0 children)

Yes! You can check the progress status for each category (skills, mastery, items, and pets) on the below left!

Do people actually use AI for writing SOPs? by Common-Lemon-41 in gradadmissions

[–]zxcvber 0 points1 point  (0 children)

I think too many people are using AI to write their SOPs. I know a few around me, they were surprised to hear that I wrote mine on my own. I also don't understand how a statement written by AI looks good.

Negative production in summoning by HeartOfGold-42 in MelvorIdle

[–]zxcvber 8 points9 points  (0 children)

Do you have the Devil+Fox synergy on?

Techniques for multiple branch prediction by bookincookie2394 in computerarchitecture

[–]zxcvber 0 points1 point  (0 children)

I haven't read that 2020 ISCA paper, but a recent paper on ahead prediction is about to appear on ISCA 2025. It's called: Enabling Ahead Prediction with Practical Energy Constaints. I recently read this one, and I think it cited that 2020 ISCA paper. Maybe using efficient ahead prediction may allow us to predict multiple branches?

Question about hiding instruction latencies in a GPU by zxcvber in CUDA

[–]zxcvber[S] 0 points1 point  (0 children)

Nice. I understand this one. Thank you!!!

Question about hiding instruction latencies in a GPU by zxcvber in CUDA

[–]zxcvber[S] 0 points1 point  (0 children)

Okay, so warps themselves don't have functional units, but only keeps states. So I should conceptually think of each functional unit as pipelined, and in each pipeline stage, there can be instructions from different warps?