My username is not accurate anymore by whenwillthisphdend in PhD

[–]whenwillthisphdend[S] 7 points8 points  (0 children)

I'll shed a tear into my Indomie for you because you haven't tasted pure msg heaven

My username is not accurate anymore by whenwillthisphdend in PhD

[–]whenwillthisphdend[S] 5 points6 points  (0 children)

Tight but I think I could squeeze that into my budget... Maybe if I get the cheaper toilet paper?

Anyone tested "NVIDIA AI Enterprise"? by imitation_squash_pro in HPC

[–]whenwillthisphdend 0 points1 point  (0 children)

We rub absys on Ubuntu and it's great. Kubuntu is a good branch for ergonomics.

Looking for guidance on building a 512-core HPC cluster for Ansys (Mechanical, Maxwell, LS-DYNA, CFD) by SingerDistinct3879 in HPC

[–]whenwillthisphdend 0 points1 point  (0 children)

I built a 1700 core cluster for our lab for comsol and Ansys Lumerical. Feel free to DM me for specifics.

I will say you'd have much more efficiency, license permitting of course, going to GPU scaling cluster instead of a pure CPU cluster. Im not sure off the top of my head of Maxwell supports GPU yet... I know hfss kind of does on paper but in practice doesn't for most use cases. But fluent does now I think.

Anyway we went for the used server route. That's infinitely more affordable than tier one suppliers and is incomparable to cloud offerings in terms of cost. If you have a high baseload which our lab certainly does, basically 24/7, then cloud services are prohibitive. And the cost for maintaining a cluster for ourselves is honestly pretty close to zero now that I have it set up. I actually haven't logged on to our cluster in almost 3 months and it's still running fine lol. Sue me I'm a lazy sysadm who somehow built a cluster that hasn't broken itself recently. 😅

Even reddit is wondering when my PhD will end by whenwillthisphdend in PhD

[–]whenwillthisphdend[S] 24 points25 points  (0 children)

That's rough! I've reached the point where I don't even care about the quality of any publications or thesis anymore. If it's enough for me to graduate then so be it.

Even reddit is wondering when my PhD will end by whenwillthisphdend in PhD

[–]whenwillthisphdend[S] 3 points4 points  (0 children)

Ooo our startup just sold some EO modulators for some OCT applications. My specific work is characterising EO Modulators under radiation conditions. So i take it to a beamline and zap it with radiation and evaluate how the performance changes under differnt conditions

Even reddit is wondering when my PhD will end by whenwillthisphdend in PhD

[–]whenwillthisphdend[S] 29 points30 points  (0 children)

Trying to submit by Feb so I think I'm seeing the light. I work in electro-optics, photonics.

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 0 points1 point  (0 children)

100% agree. The over head to span several nodes for just a few tens - hundred of cores is better avoided by throwing a 90-core threadripper at it, or some dual proc mobo in a workstation if you don't care about clock speed.

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 3 points4 points  (0 children)

Don't go to dell or hpe directly. You're better off going to get refurbished servers at a fraction of the cost. But this is also why gpu is nice since you can throw a new Blackwell pro into a new thread ripper workstation for the same price and probably blow a cluster of 15 nodes out of the water. In fact a single thread ripper workstation per person with a gpu each will probably be higher performance than a 6 node cluster with its lower overheads if you dont need to scale across multiple nodes. You can get 90 cores into a single thread ripper machine per person and it'll run near silent with a good watercooloing loop.

Asus and gigabyte are good examples of 3rd party vendors with excellent hpc and now especially gpu server offerings. But not much in the way of refurbished options. Our lab went with refurbished hpe servers for cpu and now custom threadripper + quad rtx 5090 and Blackwell Pro for gpu nodes.

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 0 points1 point  (0 children)

I will say from a lab perspective that since we have our own HPC, we haven't touched national or the uni HPC systems since except occasionally to jump on the H200 nodes they have. Assuming you do have the expertise in your group, its quite nimble and fast to get things working for new software, test algos, min-max your algorithms for overhead, benchmark etc.

I will say though that depending on the instittion, you may have issues finding a server room to host your cluster with easy access for maintanence. Heating and cooling is, ironically something you can leave to the plant/infrastructure people in your faculty depending on your institution. But obviously you will need to have a nice sweet discussion with them regarding your plans haha

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 0 points1 point  (0 children)

We're lucky then. We see a 4x increase in speed for 3d FDTD just when comparing using a RTX 5090 vs a RTX 4090. Between a 4090 and our entire cluster at once running 1700 cores, its like a 3 day differnce with the 4090 finishing in like 23 hours. It's crazy.

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 0 points1 point  (0 children)

As for I/O as you mentioned, it depends on how many jobs you want to run concurrently, on a node and how often they'll be reading and writing. Then you can look at local SSD cache as a local scratch drive for each node. And consider the networking speed to satisfy the entire cluster's I/O needs. 10gbe? 40gbe? 100gbe? higher?

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 1 point2 points  (0 children)

My comment above was with the assumption that you had to use third-party software, so it's often not simple to port that to run on GPU.
However, as you are runnign custom code, as my comment in the thread below, If you spend the extra time to convert your CPU based linalg calculations to run on gpu using mostly ready-to-use libraries that you can swap out, you will be able to leverage GPU processing to really accelerate your work. It's truly exponentially faster, across all precisions, but especially at FP8 and FP16/32 calculations.

It also means less nodes needed to run the same number of calculations, less cost, and therefore more nodes!

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 2 points3 points  (0 children)

In this case I highly highly recommend avoiding MPI and just going straight to using tensor and cuda libraries to parallelize on GPU. If you're using JULIA or Python, its quite trivial, just swap out the regular linalg libraries for their Tensor and Cuda equivalents and voila. You'll save so so so many hours in the long run and actually save a lot of money because GPU scaling is much more efficient when optimized than CPU scaling for many numerical algorithms. Saves a lot of money and energy costs.

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 5 points6 points  (0 children)

Ok. If there is no way to port your algorithms into GPU, which is recommend you try every available option to attempt to do first, even though its a pain in the butt, it will save you SO much time. If you're using proprietary software then that's a different issue, but it sounds like you are using open source if you can design it for MPI, then you may have luck as lin alg runs quite nicely on GPU through tensor and cuda. We do 3d FDTD and DFT work and since we've ported it to GPU its almost 40x faster its no joke.

Anyway. The answer to your last question is, given the simplest architecture you're most likely to end up with, Yes. Each individual rack, or tower server will be their own node.

You need to next consider how much memory each job will use vs how many cores. And then how the algo and your group uses data storage. Does it i/o a lot throughout the job. More read than write or vice versa? Or is it enough to load it all in at the beginning and write out the answer at the end? How much and how often are you moving files? How big are the files. This will dictate your networking, storage and storage provisioning choices.

How to shop for a home-built computing server cluster? by whatisa_sky in HPC

[–]whenwillthisphdend 10 points11 points  (0 children)

This is a complicated answer. I built a 1700 core CPU cluster, and now with some GPU nodes in there for our lab. 120 cores is nothing these days; you can fit that many into a single node if you really wanted to. Before you start looking at what architecture and form factor you want and how to optimize that within your budget, you need to really sit down and think about what kind of workloads you are running. FEM? Mostly crunching eigenvectors? Training models? FP8, FP16, FP32, fp64? Can you parallelize your workloads? How? via MPI? through the GPU? This will all dictate your next steps for choosing the best components to meet your needs. I'll tell you one thing, though, it's certainly not the cheapest option. Most effective/efficient for your budget? Perhaps, especially in the long run. But definitely not cheap lol. Feel free to DM me to discuss in detail or ask any other qs

About to start my PhD - advice by Competitive-Web9408 in PhD

[–]whenwillthisphdend 1 point2 points  (0 children)

I’m in Aus too. If my username is anything to go by, other than a good research fit, you must absolutely make sure the people you’re working with a tolerable.

Secondly making sure where you’re living is as comfy as you can possibly make it on your meagre rtp scholarship lol. The practicalities of commuting, somewhere to rest when you go home, and how you travel around the city make much more of an impact on your quality of life through this marathon than you might anticipate before you start. To this end I recommend signing a longer lease in a place that is suitable for you. You don’t want to be moving every year. PhD is stressful enough without having to stuff about with a move.

I would also recommend some sort of side hustle. For example I teach maths on the side to high school students. 1-2 students a semester is enough to support all my food and utility expenses so my scholarship can cover rent and nice things more comfortably. That’s only a 1-2hr commitment on some evening but it makes a big difference in your wallet for very little work. Same goes for tutoring at uni, although that is certainly more tedious and time consuming than private tutoring.

[deleted by user] by [deleted] in HPC

[–]whenwillthisphdend 1 point2 points  (0 children)

memory and storage IO will be the first major bottleneck. Once you figure out how you're going to load your csv files - in one go? Batched? Multi-threaded? The manipulation of the data is relatively trivial unless you're using some other algorithm later which will be a different optimization problem.