Originally Published: Wednesday, 11 July 2001 Author: Sean The RIMBoy Jewett
Published to: enhance_articles_sysadmin/Sysadmin Page: 1/1 - [Printable]

NCSA: Supporting the Revolution of Linux High Performance Computing

Regular Linux.com contributor Sean Jewett recently got the chance to attend the NCSA conference "Linux Clusters: The HPC Revolution" and caught up on some exciting work. He sent us this report.

   Page 1 of 1  

A couple of months back a question was asked at work: "what is NCSA up to these days?" The general Internet community has not heard much from the folks who made the reference telnet client for Mac and DOS, or Mosaic the first ubiquitous browser. Bringing up NCSA's website found the question quickly answered. Among other things NCSA was hosting a conference entitled "Linux Clusters: The HPC Revolution". The conference was held June 25-27, 2001.

Not much was on the site to begin with, a call for papers and the date, but it looked like it might be interesting. It was NCSA after all. If you could only see one conference on high-performance computing -- Linux clusters no less - NCSA would be the people to see. After assessing the costs and figuring the travel expenses from Nashville TN, it was agreed that attendance would be a benefit to the department. It was just a matter of taking care of the paperwork and making the drive.

According to Mr. John Towns, the Division Director of the Scientific Computing Division for NCSA, the conference targeted the high-performance computing arena. While primarily an academic oriented event, the conference also had participation from the various industries that support high-performance computing. At a conference designed specifically for those areas John said they were quite surprised to have to turn away papers. The attendance goals of 150 people were more than met with 179 attendees. People converged on Urbana-Champaign in Illinois from as far away as Denmark, Italy, Thailand, Puerto Rico, and Montana (wait, that's in the US!)

Monday set the pace of the conference. A day reserved for tutorials, attendees were presented with two tracks. Those who chose the presentation called "Cluster-In-A-Box" quickly found out what NCSA is up to. Having realized that the tools to help the average user build a Linux cluster are either not available or too rough to use, NCSA along with IBM, Intel, ORNL and some other industry heavyweights have teamed up to form the Open Cluster Group. The first project of this group is OSCAR: Open Source Cluster Application Resources Open Source Cluster Application Resources.

Built on IBM's LUI ( Linux Utility for cluster Installation), OSCAR's goal is to make the installation of a Linux cluster as simple and yet configurable as needed. Those that have used Scyld Beowulf installer have an idea of how easy Linux clustering can be. While OSCAR is not yet at the ease of installation stage that Scyld has achieved, the Open Cluster Group seems driven to narrow the gap and build a better cluster.

By building upon IBM's LUI the Open Cluster Group has a good foundation upon which to build. LUI enables users to easily push out configuration changes such as new kernels. OSCAR then adds the C3 cluster management tool from ORNL, MPICH Message Passing Interface, and PVM Parallel Virtual Machine, for message passing, OpenSSH/SSL for secure transactions, and manages the job queue with OpenPBS. Although the tools are based on RedHat 6.2, a RH 7.1 based version (OSCAR 1.1) is expected soon. The nice part about OSCAR is that it does not lock you into certain kernels and configurations, which Scyld's software does to a certain extent. OSCAR can use the stock RedHat kernel or can push out kernels of your choosing. The Open Cluster Group wants all of the tools and software developed and distributed there to be under an open-source license. This avoids the complications users face with closed source or proprietary clustering solutions.

Other nifty features of OSCAR include Ethernet PXE support to enable automated installation of nodes via the network. The Group has also put together a full-featured installation manual to walk users through the setup. The manual hopefully leaves no questions unanswered, a common complaint from people unfamiliar with open-source projects. With screen shots, the manual is twenty-four pages and comes complete with troubleshooting suggestions.

Monday afternoon saw presentations on "Performance Tuning for IA32 and IA64 based Clusters" from the Ohio Supercomputer Center and "Low-Cost Linux Clusters for Biomolecular Simulations Using NAMD" from the University of Illinois (UIUC). At some point during the day word got out about the beast NCSA was keeping in their operations center. Of course, all of this talk of clustering had everyone itching to see what NCSA had. NCSA staff thankfully quelled the appetite of the attendees and threw together a tour of their facility. Much like the monolith in 2001, everyone huddled around NCSA's top-ranked cluster dubbed Platinum. Unlike its name, but like the Monolith, this system was jet black. IBM black, to be precise: 512 nodes of dual Pentium III power with a paltry 1.5 gigs of RAM per node, Myricom's Myrinet and fast Ethernet interconnects. This cluster is built for speed. Currently ranked 30th in the Top 500 Supercomputer Sites, NCSA's Platinum is currently the fastest known unclassified Linux cluster in the world.

Tuesday marked the start of the conference: the presentation of papers, and networking of minds. Dan Reed, Director of NCSA and the Alliance kicked off the event by highlighting the Grid and large scale computation displays. Intel's Tim Mattson started events Wednesday morning discussing where clustering started and where it is going. Tim admitted that he worked on a cluster of 286's in the early 1980's and understood then the potential we are only realizing now. Topics covered in the presentations on Tuesday and Wednesday where fairly diverse.

http://www.ncsa.edu/LinuxRevolution/schedule.htm.

Putchong Uthayopas of Kasetsart University in Thailand demonstrated the SCE: a software tool for Beowulf clusters. The presentation was simply amazing. Putchong's group demonstrated very polished and refined cluster installation and management tools. And don't tell Putchong's group that VRML is dead. Attendees were blown away with his group's VRML tools for polling information on nodes. You've never seen a cluster until you have witnessed it in 3D (in particular the filesystem). As Putchong put it, 'the people that give you the money don't necessarily understand what it is you're doing. They do like graphics though'. Of course, Putchong stressed that SCE is configurable to the needs of the user, and schedulers such as PBS could be plugged in if needed.

Steven Timm gave an overview of what computing is like at the Fermi Lab. After hearing the volumes of data Fermi Lab generates, you come to realize that terabytes of storage are quite blasť. It's the petabytes they're after. However, it would be Princeton Asst. Professor of Geophysics Hans-Peter Bunge who would drive home a very important lesson when building Linux clusters: for Bunge the key in building a faster Linux cluster is not to run coarse simulations faster, it is to enable him to run higher resolution simulations in a similar timeframe. His Geowulf System is a great example of finding the best performance for the lowest price.

Neil Gorsuch's (of the NCSA's Clustering Group) presentation on Linux Cluster Security drove home the need for an easily manageable site-wide security configuration tool. In Academia the goals and needs of researchers outweigh what any business would consider a standard security practice. As a result, some clusters must have their nodes publicly accessible (all of the nodes publicly accessible). Trey White of ORNL discussed issues faced when using proprietary clustering solutions. Apparently, getting different high-end vendor's clustering solutions to play nice across hardware platforms can be slightly difficult.

Support for the conference came from the NCSA Alliance, along with IBM, Intel and Myricom. As a result, several of the vendors gave presentations. Chuck Seitz -- CEO and CTO of Myricom -- went into the science of high-performance networking with "Dispersive Routing in Clos Networks". Tim Mattson discussed Intel's work in cluster computing while Luiz DeRose shared information on the Linux cluster tools being developed by IBM's Advanced Computing Technology Center.

NCSA did a nice job in planning the conference. Presenters were given plenty of time to not only show their work but to answer questions as needed. Likewise, downtime between presentations gave people a chance to network and mingle. Lunch was provided which kept the focus on the conference, and dinner was entertaining. Entertainment for Tuesday's dinner included an indoor mini-golf course and a mechanical bull. Perhaps not what you'd expect from a bunch of people discussing gigs and high-performance networking, but it was a hit. John Towns commented that the computing atmosphere reminds him of the early 80's in terms of the close collaboration between the research institutions and the Cray's and CDC's. Those close collaborations are now being paralleled with Linux cluster solution providers, particularly at the high end. John said the level of collaboration taking place was refreshing and called the conference a success because of that.

Sean "The RIMBoy" Jewett is a Systems Administrator for the Center for Structural Biology at Vanderbilt University. By day he wrangles Linux and SGI, by evening he maintains his own floppy distro. and is the VP of NLUG. The Center sponsored his attendance at NCSA's conference.





   Page 1 of 1