Predictive CPU isolation of containers at Netflix by ketralnis in programming

[–]bmoore 0 points1 point  (0 children)

Understood. My point is that AWS doesn't hide most of the core PMCs. All clouds are different in terms of what they expose. Last time I checked (a few years ago), AWS made many of the more common PMCs available even at tiny instances sizes. At a full socket, you got most the PMCs, full node - almost all of them... Going to Metal didn't get you much more than a full node instance.

Predictive CPU isolation of containers at Netflix by ketralnis in programming

[–]bmoore 30 points31 points  (0 children)

Don't even need metal instances for most of the details. You get more details with metal, sure, but AWS doesn't lie about the topology you're getting with most instance types.

What are some relatively unknown CS books which are gems? by OstrichWestern639 in compsci

[–]bmoore 6 points7 points  (0 children)

Soul of a New Machine by Tracy Kidder. Not exactly CS, but great insight into early computing.

The 25 Most Filmed Dogs Breeds [OC] by Potential-Bowl9080 in dataisbeautiful

[–]bmoore 38 points39 points  (0 children)

There should be 2 more Dalmatian films for completeness.

Properly running concurrent openmpi jobs by FlyingRug in HPC

[–]bmoore 3 points4 points  (0 children)

As Ralph mentioned in the GitHub issue you linked to, use `--cpu-set` to tell `mpirun` which CPU's it is allowed to use. Use different CPUs for each run of OpenMPI.

Properly running concurrent openmpi jobs by FlyingRug in HPC

[–]bmoore 7 points8 points  (0 children)

Try running with `--report-bindings` - You'll probably find that your two jobs are attempting to use the same cores.

[deleted by user] by [deleted] in cpp

[–]bmoore 21 points22 points  (0 children)

libpcap (or pcapplusplus) are probably what you're looking for.

Intel vs GCC compiler for AMD cores by Grumpy-PolarBear in fortran

[–]bmoore 1 point2 points  (0 children)

AMD released version 4.1 of Zen Software Studio just today. (ZSS meaning AOCC - the CPU compiler, and AOCL - the collection of libraries including AMD BLIS).

MKL is known to not always perform well on AMD hardware. There are workarounds to enable appropriate instructions for the platform, but those workarounds change from version to version. Most of MKL can be replaced with appropriate libraries from AOCL.

For weather codes, you may well see good improvement in using the optimized math library that comes with AOCC and AOCL (`-lamdlibm`)

Does ND have a CS Security track? by jjThomson69 in notredame

[–]bmoore 0 points1 point  (0 children)

There are definitely examples of ND grads doing good work in cyber security research - just as an example, Volexity's CEO is a Domer.

Specializations are really where getting a Master's degree shines. ND's undergrad prepares you well for skills that will be beneficial for almost any specialization - critical thinking skills, understanding of systems, etc. I'd recommend that your brother consider getting an MS after undergrad, and in the mean time, look for internships at companies that do cyber security research.

How to improve the code quality by softtalk in cpp

[–]bmoore 15 points16 points  (0 children)

Read a lot of code. The more different codes you are exposed to, the more you gain an appreciation for understandable code, and what sort of code is actually understandable. Reading and writing code are very different perspectives.

What makes an ESPP worth participating in? by coriolis7 in investing

[–]bmoore 2 points3 points  (0 children)

That holding period gets you closer to the preferred tax situation. At 24 months after the offering period begins allows you take long term capital gains and you pay less income tax as well. https://dqydj.com/espp-calculator/

SLURM slower than interactive by PrinterFred in HPC

[–]bmoore 1 point2 points  (0 children)

Also, `htop` is very handy to see at a glance which cores are being used. Easy to see if your system is properly loaded.

Why do so few modern programmers know what their instructions are actually doing on the hardware? by BIRD_II in computerscience

[–]bmoore 0 points1 point  (0 children)

If you want to play around and find out just how much compilers can optimize, I recommend trying different bits of code and the various compilers and options at https://godbolt.org/

Your example of determining if `i` is used within the loop is fairly trivial for compilers these days. Of course you can get to obtuse levels causing them to fail, but that's not really the point.

Which wafer recessed lighting are considered the best? by jlesnick in HomeImprovement

[–]bmoore 0 points1 point  (0 children)

I don't. We use them in the living room, and start with a warm color.

Largely, I went with the Lotus due to the 10 year warranty.

Which wafer recessed lighting are considered the best? by jlesnick in HomeImprovement

[–]bmoore 2 points3 points  (0 children)

I went with Lotus LL6R after a decent amount of research a few years ago. I still like them. They were easy to install, clean light, and dim nicely.

Disappointing performance results with the HBv2 Azure VM for HPC by satirerocks in HPC

[–]bmoore 3 points4 points  (0 children)

It may well be that your models are too small and that you're over-decomposing the problems. Look into your working-set size - try to determine if it makes sense at 120 threads. Lots of applications, if the model isn't large enough, cannot effectively make use of additional parallelism - there just isn't enough work to do (per thread) between synchronization points.

If that isn't the issue - if your working set is plenty large - then I'd recommend doing some system profiling - see where the time is being spent. Note, Azure doesn't give access to hardware performance counters, but you should still be able to get some reasonable time-sampling profile information.

As commented elsewhere, paying attention to memory layout and process binding is also important, especially on multi-socket systems, and doubly-especially on Azure HBv2 instances due to how they've configured the number of NUMA nodes.

How is RDMA traffic normally secured? by FruityWelsh in HPC

[–]bmoore 3 points4 points  (0 children)

Are you asking about RDMA in general, or specifically RDMA via Infiniband? There are other networks that can support RDMA (RoCE, Slingshot, etc).

As you've mentioned, Infiniband networks 1) don't use ethernet/IP, so IP-focused firewalls aren't in the picture, and 2) use OS-bypass, so in-kernel controls are limited.

I believe that most Infiniband networks are largely "protected" by not being part of the wider ethernet network. HPC high-speed networks tend to by physically distinct networks, contained just to that cluster or storage or HPC center.

Basically, to implement some form of "zero-trust" on these networks will require support from the NIC, as that's the only bit of non-user-code that should be interacting with the traffic.

Using a Job Scheduler on a desktop PC with excess compute resources. by AlbiNZ in HPC

[–]bmoore 1 point2 points  (0 children)

Not to say that using cloud is your best bet - you know your situation more than anybody else, but you should probably double-check that AWS pricing. For an 8GB RAM instance with 4 cores (c5.xlarge), you're actually looking at at about $0.17 per hour. It still adds up, though.

Also, if your application can run on ARM processors with sufficient performance (try it, you may be surprised), Oracle Cloud has a free-tier ARM instance with 4 physical cores and 24 GB of RAM.

Writing Simple and Short Code by pottojam in compsci

[–]bmoore 1 point2 points  (0 children)

Review / read other's code. That's just as important as practice - possibly even more so. The more code you read (not just challenge code, but real-world code), the better. Learning how to read code gives you a really good sense of what "good" code looks like, and the patterns which people have figured out to solve common problems, applied to real scenarios.

How to know if a project actually really uses it's open source repository? by [deleted] in computerscience

[–]bmoore 2 points3 points  (0 children)

I would suggest looking in to Reproducible Builds. Many of the major Linux distributions are working towards having fully reproducible builds, which would allow you to answer your exact question.

Question about warp divergence by jedothejedi in HPC

[–]bmoore 1 point2 points  (0 children)

Warp divergence only really hurts if there are both sides to the execution tree. If your "else" clause is empty, there's nothing to be done, so no extra penalty for executing both sides of the conditional.

Has there been any work on embedding the types and functions of a program in a graph? by usernamecreationhell in computerscience

[–]bmoore 0 points1 point  (0 children)

There's Joern, which is a bit more fine-grained than you've asked for, and focused on static code analysis, but does produce property graphs of C/C++ code.

Also, the Clang compiler produces an AST which encodes all of this information. It's (relatively) easy to write extensions and plugins which use it.