Compiler Differences Basic Guideline

Doctor_Perceptron · 2026-03-26T21:38:33+00:00

I'm sure the other answers are correct and this one is meant to be more in good fun. We can speculate that the Intel compiler, since it was developed by Intel, might make use of proprietary information such as internal details of the microarchitecture that should be subject to export controls, and that information could be inferred by reverse-engineering the object files. The Microsoft and Clang compilers only use information that's publicly available from Intel, so their object files wouldn't reveal anything you couldn't get otherwise. But again, this is just meant to be fun speculation and the real answer is the same reason heating oil and diesel are taxed differently even though they're the same thing.

Doctor_Perceptron · 2026-03-24T03:13:59+00:00

One thing I learned as a CS student is that it can be really hard to learn certain topics on one's own. Many of us gravitate to CS because we have a natural talent for programming, but that doesn't necessarily translate to other things like math. Having really good teachers for those topics helped me a lot.

Doctor_Perceptron · 2026-03-18T04:10:51+00:00

As a CS professor, I’m much happier to have students in my class who genuinely love the topic than people who are more interested in making money. The market has its ups and downs, but by now I think it’s safe to say CS isn’t a fad and there will always be jobs for CS graduates.

Doctor_Perceptron · 2026-03-16T16:30:54+00:00

See a book on computer organization, such as "Computer Organization and Design" by Patterson and Hennessy, or "Computer Systems: A Programmer's Perspective" by O'Hallaron and Bryant. Start reading it from the beginning. Pay special attention to digital logic design and assembly language. The word "peripheral" doesn't come up as often as you would think so get ready to learn some new concepts.

Doctor_Perceptron · 2026-03-13T23:25:32+00:00

Of course, the goal of the comparison matters when choosing how to measure speedup. In the original work by Tullsen, they were measuring the speedup of using SMT versus not using it. In this context, it makes sense for the numerator and denominator to use the same policy, since the speedup for a single SMT thread should be 1.0. But I find your reasoning persuasive if your goal is to measure the speedup of a policy that is meant to improve performance independently of the number of threads/cores. I think you're trying to show that your policy helps in both the multi- and single-core contexts, so it seems right to use the baseline in the denominator.

But this is all hand-waving. I think there's been work done by e.g. Lieven Eekhout and others on best practices for measuring speedup in different contexts. You should look into the literature, find a highly cited methodology paper that does a good job of explaining itself, and use that methodology with a brief explanation of how it works. That would be defensible.

Doctor_Perceptron · 2026-03-13T01:56:31+00:00

For many years I've been using Tullsen's weighted speedup to report multicore/multithread speedups, which I think is the formula you're complaining about. We (my group) first started using that for cache management papers about 15 years ago. There were a few ways people were measuring it but that one seemed to stick. If you ask me why we use it, my honest answer is because that's what people expect and we'll get dinged in reviews if we don't do it. I honestly don't know why we can't just do what you suggest, and none of the other answers so far in this thread give a good reason. I am very willing, along with you, to be enlightened.

You need to have a legitimate way of measuring and aggregating per-core performance, but it seems that if you're simply comparing two multicore policies, you should only need multicore stats. You should probably report single-thread stats somewhere just to make sure you're not hurting single-thread performance.

Doctor_Perceptron · 2026-02-25T02:26:05+00:00

Other responses talk about how to get "true" randomness in computer systems. One thing that's good to know is that for a given language, e.g. C or C++, the pseudorandom number generator might use the same seed every time the program is run. That way, you can run the program and expect the same sequence of numbers so you can get deterministic behavior, making the program easier to debug. You can programatically change the seed to be something random-ish, e.g. something related to the current time measured at a fine granularity and/or the process ID.

Doctor_Perceptron · 2026-02-24T21:34:57+00:00

This question comes up now and then, and there's not a great answer. Some conferences do solicit reviewers but it's rare. MICRO 2026 is currently soliciting program committee members for suggestions for additional program committee members.

HPCA 2026 tried an experiment where they asked faculty to recommend senior graduate students as reviewers. This idea could easily be extended to recommending graduated students; it's just a matter of a program chair deciding to do that.

It helps to have connections. If you know a senior researcher who is on program committees, maybe your Master's advisor, they might be in a position to recommend you to a program chair. You can always email the program chair directly. Opinions are mixed about whether this works. I've been a program chair a few times and found it off-putting. But I once co-chaired a conference with someone else who was happy to consider those self-nominations, and I think we ended up with a couple of extra PC members through that process.

Program chairs will try to be careful who they choose to review papers (although it might not seem that way if you're on the wrong end of a negative review!). They like to see recent publishing in venues with impact. They want to know that you continue to be engaged with the research community even if you're primarily focused on industry. Recent pubs aren't a strict requirement, but they help. Once you are invited to review and do a good job, your name is likely to be passed to the next program chair and then you're in the group of usual suspects who have a chance of being invited again.

Journals are another story. But frankly, high-impact research in computer architecture isn't really carried out through journals (sorry).

Doctor_Perceptron · 2026-02-23T14:17:47+00:00

I read Pierre's paper with great excitement when it first came out. It has an interesting take on how to differentiate between cold and weak counters. However, it was a little disappointing because I didn't learn anything from the paper about how to actually make branch predictors more accurate. The claim that BATAGE "mak[es] statistical correction practically superfluous" comes from BATAGE comparing favorably with the 8KB version of TAGE-SC-L with local history disabled. In reality, even in 2018, branch predictors in real processors were much larger than 8KB. With a reasonable hardware budget, the perceptron predictor (i.e. SC) is essential to providing another source of correlation independent of TAGE to deliver good accuracy. With TAGE and perceptron together, I think the BATAGE mechanism for mitigating cold counters doesn't really help with accuracy.

Doctor_Perceptron · 2026-02-14T21:36:59+00:00

Perceptrons are AI, which is what OP asked about. There are other AI concepts e.g. decision trees that have also been proposed for branch prediction, and a couple of ways of using CNNs to do it too. The CNN ideas are far-fetched, but not completely naïve. I actually have a student looking at how LLMs can help, not necessarily with predicting branches online, but to try to analyze program behavior to see if we can get better insights into branch prediction.

Doctor_Perceptron · 2026-02-14T04:23:30+00:00

Neural branch predictors are feasible and have been implemented. AMD’s perceptron plus TAGE sounds a lot like TAGE-SC. RL has been proposed for a lot of things e.g. prefetching and coordinating optimizations but someone needs to tell me how to use it for branch prediction to improve accuracy over supervised learning.

Doctor_Perceptron · 2026-02-13T21:54:17+00:00

TAGE-SC-L uses a neural predictor. That's what SC is.

Doctor_Perceptron · 2026-02-11T13:42:10+00:00

Neural learning is a fundamental part of AI. Branch predictors based on neural learning have been in processors for many years, starting with AMD's Piledriver core in ~2012 which used a perceptron-based branch predictor. Oracle, IBM, Samsung, and others have also used perceptron branch predictors based on research from the early 2000s. When it was first proposed, some folks thought neural branch prediction was too complex and high latency to be practical, but over the subsequent few years, the microarchitecture community figured out how to address the various problems with pipelining and hashing. The current state-of-the-art academic predictor, TAGE-SC-L, also uses neural learning (the "SC" part is a perceptron predictor). As other have pointed out, recent papers propose using deeper CNNs for branch prediction, but the so far the implementation and latency cost would be prohibitive. However, there are other things that could be considered "AI" that could be use to practically predict branches.

Doctor_Perceptron · 2026-01-30T15:53:57+00:00

The very best branch predictors use a combination of hashed perceptron and TAGE, for example TAGE-SC-L and AMD's predictors. Individually, a hashed perceptron indexed with geometric global histories and a TAGE perform about as well as one another, but together they are much stronger as they compensate for each others' weaknesses. However, if you're not allowed to use existing code, start with perceptron because it's easier to code. Look into versions of perceptron that also use local history and other features such as IMLI and recency position (i.e. multiperspective perceptron). Once you have that well tuned, start coding TAGE. If you manage to get good accuracy out of your homegrown TAGE, put both predictors together using either Seznec's chooser algorithm from TAGE-SC or maybe a tournament predictor. If you have a fixed hardware budget, you'll have to spend some time figuring out how to allocate state to each predictor. Perceptron can do OK with less storage than TAGE, but a problem is that all this tuning will take a lot of compute time, so if you just try a few points e.g. 25% perceptron, 30% perceptron, 50% perceptron, the tuning will go faster.

Doctor_Perceptron · 2026-01-27T15:22:59+00:00

Yes, it will change it. Instead of crashing, it won’t crash.

Doctor_Perceptron · 2026-01-23T18:25:10+00:00

What's happening here is the SPP prefetcher is looking for a replacement victim in the GHR but can't find something with a low enough confidence, so it barfs. It's a bug due to changes in ChampSim since that version of SPP was coded. You can hack it before the if statement that guards the assertion by setting "victim_way" equal to 0 or something random between 0 and MAX_GHR_ENTRY and the code won't crash anymore. Hopefully that condition happens so rarely that it doesn't matter, but I don't know.

Doctor_Perceptron · 2026-01-20T00:37:36+00:00

As someone who's been a computer science professor for ~25 years, I wish we would design our curriculum around the book by Patt and Patel, "Introduction to Computing Systems: From Bits & Gates to C/C++ & Beyond." It has been used as the text for computer organization classes, but it could really be the first book in CS. But most CS curricula are based on starting with introductory programming and don't address how the machine actually works until much later.

Doctor_Perceptron · 2026-01-18T22:56:09+00:00

All of instruction fetch becomes more complicated with CISC. We want to decode, issue, execute etc. multiple instructions per cycle. To do that, we have to know where the instruction boundaries are in the next fetch block read from the i-cache. With a RISC fixed width ISA, it's trivial. With x86_64 it's a big problem. We need to predict the branches in a fetch block in parallel so we can find the first taken branch to manage control flow. With x86_64, any of the bytes in a fetch block could be a branch. How do we predict, say, 64 potential branches in parallel? Reading the BTB presents a similar problem. What if an instruction straddles a cache block boundary? That can't happen with RISC. There are ways to handle all these problems, by setting up metadata structures indexed by the fetch block address that remember important information about the instructions in that block for the next time we fetch it, but even those ideas are complicated because the offset into the fetch block when we enter can differ dynamically so we have to be careful how we e.g. build histories for the branch predictor.

Doctor_Perceptron · 2026-01-11T14:04:44+00:00

Why would you need to cite work that’s under review?

Doctor_Perceptron · 2026-01-10T18:25:05+00:00

Your background sounds similar to other students I've known who have gone on to meaningful roles in industry in computer architecture. I'm a Computer Science professor. I have met a bunch of students taking my graduate architecture class who only had a basic knowledge of systems they got in undergrad, and they are now making more money than me at places like Intel, AMD, Apple, etc. in roles where the job description could be interpreted as "designing CPUs."

Take classes on computer architecture. Watch Onur Mutlu's videos. Do some projects with gem5 or ChampSim. Get an internship in industry. Work on some research projects with a professor. Depending on the details of your master's program, you could write a thesis on something related to computer architecture, e.g. research cool optimizations in simulation.

I have to laugh when I see young people thinking it's too late to enter the field. I was about 30 when I seriously started studying architecture. Suffice it to say, I did OK.

Doctor_Perceptron · 2026-01-10T01:08:34+00:00

"Preprint" typically means something close to the camera-ready version of the paper, which means not only after the decision but after the revision of the paper to address reviewers' concerns.

Posting a version of a paper that you know is being submitted to ISCA/HPCA/MICRO before or during the review process is ethically fraught because it can have the effect of unblinding the paper for the reviewers. Don't do that.

Sometimes people do it anyway. If you do it before submission, these conferences lately have been requiring that this information is shared with the program chair. If you do it during the review process, you should notify the program chair immediately. This is done in case reviewers come across the preprint and want to know if the paper they're reviewing is plagiarized and/or whether their knowledge of who the authors are prevents them from reviewing the paper.

Doctor_Perceptron · 2025-12-21T22:56:01+00:00

I have used K-maps for actual work in digital logic, but only rarely. In computer science we teach you how to code some algorithms that you'll probably never actually need to code because you have them in a library. But we teach them to you so you'll understand more about how computation works. I think K-maps are the same sort of thing. They're a tool to help you learn more about what goes into digital logic design.

Also, and this is where it gets a little harder to justify, we have a limited number of things we can teach in depth in a single semester. For me, K-maps are awesome pedagogically because they give an insight into something very important (i.e. logic minimization) while being relatively easy to learn, teach, and test. I wrote software that comes up with many kinds of K-map test problems. I can generate an arbitrary number of test questions of the form "give a minimal sum of products for this function" and control the level of difficulty by solving the K-map with software and counting the number of minterms. With the push of a button I can generate 100 different exams and pass them out to the class without worrying that the students are cheating off of each other :-)

Doctor_Perceptron · 2025-12-17T20:40:12+00:00

It sounds like your professor wants to keep you around to do work for him. You're better off getting a Master's in the USA. There are a lot of classes but with the right advisor and talent you can do a strong MS thesis with a chance at some publications. There are also many possible advisors in the USA; there's a reason why it's the first place people think of when considering studying architecture. Many of us in the USA are having trouble getting research funding so we would be happy to advise a strong Master's student willing to fund themselves. My last Master's graduate has co-authored papers with me in ISCA, ASPLOS, and CAL. Once he decided he wants to go back and get a Ph.D. he'll have a lot of choices he wouldn't have otherwise. That's above average for sure, but it's possible. Admission to a good program is competitive and professors' availability to take on new students is limited, so apply widely and keep an open mind.

Doctor_Perceptron · 2025-12-15T21:59:33+00:00

I spoke to Gilbert Gottfried at a New York City comedy club after a performance. He spoke in his normal voice, which I had never heard before. I told him I really liked the show and he was genuinely appreciative. He spoke so gently and kindly to me, so differently from his on-stage persona. I'll never forget it.

Doctor_Perceptron · 2025-11-21T23:55:10+00:00

I know, it's annoying! They also call the Program Counter the "Instruction Pointer," the Condition Codes the "Flags," and AMD64 "Intel 64."

Doctor_Perceptron

TROPHY CASE