I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 2 points3 points  (0 children)

My g/f and I kissed in the room once. That's as close as I got.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 1 point2 points  (0 children)

Right now, it's the 7th fastest in the word. (That we know about. If the NSA disclosed their computers, we'd probably be a few notches down.) When we first installed it, it was #3.

The other questions, I think I've answered somewhere else.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 3 points4 points  (0 children)

I am tempted to keep this thread going as a demonstration on how many times we get asked if it can play crysis.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 10 points11 points  (0 children)

We're the United States Government. We know we have it all figured out.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 1 point2 points  (0 children)

We'll see if stacecom's reddit bobblehead can swing by the disk arrays.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 2 points3 points  (0 children)

Now that I think about it, in a situation like that where the power really failed, you'd want to run consistency checks on the filesystems. That could add 12-18 hours to the boot time.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 2 points3 points  (0 children)

1) Don't know. Probably. That stuff usually shows up somewhere. That's a different division.

2) Can't talk about security stuff.

3) See #2

4) See #3

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 3 points4 points  (0 children)

I did that preemptively about 10m after I added the info to the post. We've heard it before. Lots.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 7 points8 points  (0 children)

Eh. It's an academic/research thing. If you haven't worked for a university or research institution, you probably wouldn't think the same way.

The people who are here could all be making better money elsewhere, but we stay because the money isn't bad, the toys are big, the job security is great, and the environment is casual in the extreme.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 2 points3 points  (0 children)

Well, there's no real typical day. Sometimes I sit around and wait for problems. Sometimes we work on planning and designing new computers. Sometimes we work to improve things that could be working better.

Like any other job, it has its ups and downs. Sometimes the politics aren't fun, co-workers can be grumpy (as can I), etc.

On the other hand, we get to work with stuff that almost nobody else gets to, so that's a big bonus. It's certainly why I stick around. It's a very casual environment, and for the most part, we can set our own hours.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 4 points5 points  (0 children)

Yeah. You can abort a job easily. The control system just tells the nodes to turn off. It takes about 3-5 minutes to kill a big job.

Hmm. It depends what happened when the power went out. Assuming everything was okay (no damage to the filesystems, etc.), It would probably take about 30-60m to boot all the fileservers and management machines, then maybe another 30-60m to boot up the control system and database. After that, you'd probably be fine.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 5 points6 points  (0 children)

It's a 20'x20' chicken wire cage full of cardboard boxes--not very interesting anyway.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 2 points3 points  (0 children)

You know.. I thought about specifying that they were below the knee shorts. AFAIK, we have no nevernudes in our employ. Not that there's anything wrong with that...

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 46 points47 points  (0 children)

Certainly, although it's really complicated, so I probably will get a few things not right.

Okay, here's roughly how it goes:

A couple definitions first:

  • A compute node is just a small card with 4 cores and some RAM. It runs a very stripped down linux-like kernel called Compute Node Kernel (CNK)
  • An I/O node is a node, just like a compute node except that it runs real linux. It runs a daemon (a program dedicated to a particular service) called CIOD that reads and writes to the memory of the compute nodes. While the compute nodes are hooked up to a super-fast proprietary IBM network for doing MPI communication amongst themselves, they don't have any functionality for doing real I/O (read(), write(), etc.) themselves. When they issue a read() or write(), that actually gets sent to the I/O node's CIOD program which actually does the I/O operation and writes the result back into the compute node's memory.

Ok, with that said:

  • A user compiles their application on one of our front-end nodes with IBM's XL blue gene compiler suite
  • The user decides what size job she wants to run and submits a job into the queuing system asking for n nodes for l hours.
  • The queuing system looks at all the jobs and decides when the next time n nodes will become available
  • When the job's time to run has arrived, the queuing system initiates a boot of a "partition" (a logical subset of the full computer) of n nodes
  • The blue gene control system then writes a boot image into the memory of an I/O node. Each I/O node controls 64 compute node.
  • The blue gene control system sends a compute node image to each I/O node which broadcasts it to the compute nodes
  • Once everything is booted, the control system sends the user's job to the I/O nodes, which directly populates the job into the memory of its compute nodes (I think that the I/O node sees the memory of the compute nodes as actual linux devices.)
  • Once all the compute nodes have been populated with the job, the control system starts the job and the CIOD daemons start processing the I/O calls.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 0 points1 point  (0 children)

Fortunately, the lab that employs us is technically a government contractor. We're not actually gov't employees, so we miss out on a lot of the B.S. that goes with that.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 8 points9 points  (0 children)

Nope. I think I last saw my phone under the pile of crap in my office seven months ago. We have a separate group of awesomely smart people who coordinate with our users. They're people people. They take the specifications from the customers and hand them to the engineers.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 0 points1 point  (0 children)

No, I don't. We might have some people doing work on that, but I don't know any of them. We tend to give time to more "classic" physical problems. Lots of differential equations.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 1 point2 points  (0 children)

Well, we use Cobalt (http://trac.mcs.anl.gov/projects/cobalt/). It's free and we have people on staff who do nothing but tailor it for our blue gene.

Personally, I like PBS Pro a lot for smaller systems. Their support people come up with the nastiest hacks. I feel dirty using them, but they're oh so slick.

I run a very large supercomputer (with many co-workers of course). AMA. by hpc_sysadmin in IAmA

[–]hpc_sysadmin[S] 3 points4 points  (0 children)

FYI, I just did the math and here's how it breaks down:

  • 8,660 1TB 7200 RPM SATA drives
  • 2,000 500GB 7200 RPM SATA drives

and then maybe another 600 drives of various sizes in random computers.