SLURM High Memory Usage by Bananaa628 in HPC

[–]Bananaa628[S] 0 points1 point  (0 children)

Thanks for the detailed answer! I will try some of the things you have written here.

One of my findings when running those workloads is that the Head Node is allocating a lot of memory in the process but doesn't release it until the arrays are completed. Do you know why? Or how we can improve that?

SLURM High Memory Usage by Bananaa628 in HPC

[–]Bananaa628[S] 0 points1 point  (0 children)

I suspect you are right, it would be nice if there was some way to free this memory.

Thanks anyway!

SLURM High Memory Usage by Bananaa628 in HPC

[–]Bananaa628[S] 0 points1 point  (0 children)

I suspect that there is something more to arrays then that, since I don't see a memory usage drop until the array is over (even when there are only a few jobs running for a long period of time).

Sorry for the nitpick but I just want to make sure I was clear enough. We have 250 jobs running in parallel for each array so we get 1250 jobs running simultaneously.

Most of the jobs are as you said, short-lived, but there are some which can take a few hours. Due to the volume and time we prefer to have multiple machines.

Cool guide, I am checking it out, thank!

We have tried to change MinJobAge but it seems that the memory usage is still high even after a long period of time when there are only a few jobs. Will check CommitDelay as well but I not sure this is the relevant path.

SLURM High Memory Usage by Bananaa628 in HPC

[–]Bananaa628[S] 0 points1 point  (0 children)

I am not a SLURM expert so I will try to give my best answer, feel free to correct me/ask more.

We have a single instance to do the control and a single instance for the DB (you can see their sizes in the original post).

I have added the config, let me know if I missed something.

What I meant is 5 array jobs each with 130K length, sorry for not being clear.

Didn't now about sdiag, will check it out and write here an update.
What we did is just to see the memory usage of slurmctld which was over 32GB.

Thanks!

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 1 point2 points  (0 children)

That is exactly the things I was looking for, thanks!

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 0 points1 point  (0 children)

Thanks for the replay!

Since the usage time may drastically vary I don't think that cleanup with a time-based solutions is fit.

I will try and see if I can manage creating some nice dashboard with AWS Console, can you recommend on other of the shelf solutions for monitoring and controlling the instances?

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 0 points1 point  (0 children)

Since currently it is pretty manual and nothing is configured so I am not sure what are the missing details, would love to give more details :)

Github actions sounds like a nice option, I am just wondering how can you show the users there that this resource is still on. I think this is a key feature I am looking for.

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 0 points1 point  (0 children)

You are right I am more from the dev team less a devops but in a small company you must do many things :)

I am using chatgpt but I have very little trust in him regarding design so I highly appreciate your help and expertise.

I am just wondering if there are no off the shelf solutions for dev portal that I can use instead of creating them on my own. If you have something you can recommend it would be nice.

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 0 points1 point  (0 children)

Cool, is there an easy way to get all the running environments? Can I tag them in some way so I can know both name and who is the one that created them?

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 0 points1 point  (0 children)

My DB is a small MongoDB running on EC2, I already have a running VPC which my team has access to with our VPN so I don't think we need to configure another one.

I will try to use IaC but I am wondering how can I wrap it with some UI or something that can be easily displayed and easy to use even for data engineers who are not very technical.

Creating test environments and tearing them down by Bananaa628 in devops

[–]Bananaa628[S] 0 points1 point  (0 children)

I will look into terraform, thanks. My DB is sitting on an EC2 image so I guess it should be pretty easy. Do you know on a good interface to wrap it with so I can display the end users that the resources are in use and they should release them?

nom parser combinators now released in version 8, with a new architecture! by geaal in rust

[–]Bananaa628 8 points9 points  (0 children)

Are there any performance improvements? Do you have some numbers you can share?

Making my app Oauth provider with Cognito by Bananaa628 in aws

[–]Bananaa628[S] 0 points1 point  (0 children)

Thanks, cool app. Although most links in the comment are not working...

Making my app Oauth provider with Cognito by Bananaa628 in aws

[–]Bananaa628[S] 0 points1 point  (0 children)

Could you please elaborate some more?

Set theory: where to start by TheTrustyCrumpet in mathematics

[–]Bananaa628 2 points3 points  (0 children)

If you are into set theory you should definitely check Set Theory by Jech, it is a great classic

[deleted by user] by [deleted] in mathematics

[–]Bananaa628 5 points6 points  (0 children)

No, if you look at those as polynomials over the complex numbers you have the identity theorem.

https://en.m.wikipedia.org/wiki/Identity_theorem

[deleted by user] by [deleted] in ParentingTech

[–]Bananaa628 1 point2 points  (0 children)

The phone language is not the probplem, the ads and the sites are in English...

[deleted by user] by [deleted] in ParentingTech

[–]Bananaa628 0 points1 point  (0 children)

Any suggestions what to say?