LCDB: A First Look at Linux Clustering

Originally published: Monday, 2 April 2001	Author: Satyakam Goswami
Published to: featured_articles/Featured Articles	Page: 1/1 [Printable]
A First Look at Linux Clustering Explore the world of powerful supercomputers made from cheap hardware using Linux and clustering technologies like Beowulf, MOSIX and OSCAR.

Page 1 of 1

What is Clustering?

Clustering can be loosely defined as connecting two or more computers together in such a way that they behave like a single computer. Clustering is a popular strategy for implementing parallel processing applications because it enables companies to leverage an existing investment in PCs and workstations. In addition, it's relatively easy to add new CPUs to a cluster, simply by adding a new PC to the network. In this article we'll take a first look at the world of Linux clustering, and explore some of your options if you are interested in creating your own cluster.

When I began working with clusters the word "clustering" reminded me of Cray and the other supercomputers people buy, and with good reason. I started looking around for resources on the Web to make my own cluster, and perhaps save millions of dollars. To my surprise, there were many different clustering variations available for Linux.

As you may know Linux runs on almost any platform from x86 to RISC, In fact, the first Linux clusters - built at NASA - were made from a collection of old DX4 486's. The project was named Beowulf archive.org.

Beowulf

The most obvious choice for a Linux cluster is Beowulf. So, what is Beowulf?

Beowulf is a multi-computer architecture that can be used for parallel computations. The Beowulf system usually consists of one server node, and one or more client nodes connected together via Ethernet or some other network. It is a system built using commodity hardware components, like any PC capable of running Linux, standard Ethernet adapters, and switches. It does not contain any custom hardware components and is trivially reproducible. Beowulf also uses commodity software like the Linux operating system, Parallel Virtual Machine (PVM) and the Message Passing Interface (MPI). The server node controls the whole cluster and serves files to the client nodes. It is also the cluster's console and gateway to the outside world.

Large Beowulf machines might have more than one server node, and possibly other nodes dedicated to particular tasks such as consoles or monitoring stations. In most cases client nodes in a Beowulf system are dumb, the dumber the better. Nodes are configured and controlled by the server node, and do only what they are told to do. In a disk-less client configuration, the client nodes don't even know their IP address or name until the server tells them what it is. One of the main differences between Beowulf and a Cluster of Workstations (COW) is the fact that Beowulf behaves more like a single machine rather than many workstations. In most cases client nodes do not have keyboards or monitors, and are accessed only via remote login or possibly by serial terminal. Beowulf nodes can be thought of as a CPU + memory package which can be plugged in to the cluster, just like a CPU or memory module can be plugged into a motherboard.

Beowulf is not a special software package, new network topology or the latest kernel hack. Beowulf is a technology of clustering Linux computers to form a parallel, virtual supercomputer. Although there are many software packages such as kernel modifications, PVM and MPI libraries, and configuration tools that make the Beowulf architecture faster, easier to configure, and much more usable, one can build a Beowulf class machine using standard Linux distributions without any additional software. If you have two networked Linux computers which share at least the /home file system via NFS, and trust each other to execute remote shells (rsh), then it could be argued that you have a simple, two-node Beowulf machine already.

Classes of Beowulf Clusters

CLASS I BEOWULF:

This class of machine is built entirely from commodity "off-the-shelf" parts. We shall use the "Computer Shopper" certification test to define commodity "off-the-shelf" parts. (Computer Shopper is a 1-inch thick monthly magazine/catalog of PC systems and components.) The test is as follows: A CLASS I Beowulf is a machine that can be assembled from parts found in at least 3 nationally/globally circulated advertising catalogs such as Computer Shopper.

The advantages of a CLASS I systems are: hardware is available from multiple sources (low prices, easy maintenance); no reliance on a single hardware vendor; driver support from the Linux community; based on standards (SCSI, Ethernet, etc.).

The disadvantage of a CLASS I system is: best performance may require CLASS II hardware

CLASS II BEOWULF:

A CLASS II Beowulf is simply any machine that does not pass the Computer Shopper certification test. This is not a bad thing. Indeed, it is merely a classification of the machine.

The advantages of a CLASS II system are:

Performance can be quite good! Performance can even be better than a CLASS I Beowulf system.

The disadvantages of a CLASS II system are: driver support may vary; reliance on a single hardware vendor for some components; may be more expensive than CLASS I systems.

One CLASS is not necessarily better than the other. It all depends on your needs and budget. This classification system is only intended to make discussions about Beowulf systems a bit more succinct.

Beowulf Versus a Cluster of Workstations (COW)

Clustering idle computer laboratories in a University is a perfect example of a Cluster of Workstations (COW). What is so special about Beowulf, and how is it different from a COW? The truth is that there is not much difference, but Beowulf does have a few unique characteristics.

Client nodes in a Beowulf cluster do not have keyboards, mice, video cards or monitors. All access to the client nodes is done via remote connections from the server node, dedicated console node, or a serial console.

The client nodes use private IP addresses like the 10.0.0.0/8 or 192.168.0.0/16 address ranges (RFC 1918 http://www.alternic.net/rfcs/1900/rfc1918.txt.html archive.org) and do not have their own connection to the Internet or the institution's Wide Area Network (WAN).

On the server node, users can edit and compile their code, and also spawn jobs on all nodes in the cluster.

Beowulf also has more single-system image features that help users to see the Beowulf cluster as a single computing workstation.

In most cases COWs are used for parallel computations at night and over weekends when people do not actually use the workstations for every day work, thus utilizing idle CPU cycles. Beowulf, on the other hand, is usually considered a machine dedicated to parallel computing, and optimized for that purpose. Beowulf also gives a better price/performance ratio as it is built from off-the-shelf components and mainly runs free software.

MOSIX

MOSIX archive.org is a software package that was specifically designed to enhance the Linux kernel with cluster computing capabilities. The core of MOSIX is adaptive (on-line) load balancing, memory ushering, and file I/O optimization algorithms that respond to variations in the use of the cluster's resources, such as uneven load distribution or excessive disk swapping in one of the nodes due to lack of free memory . In such cases, MOSIX initiates process migration from one node to another, to balance the load, or to move a process to a node that has sufficient free memory or to reduce the number of remote file I/O operations.

MOSIX operates silently and its operations are transparent to the applications. This means that users can execute sequential and parallel applications just like in an SMP. Users need not care about where processes are running, nor be concerned what other users are doing. Shortly after the creation of a new process MOSIX attempts to assign it to the best available node at that time. MOSIX monitors all the processes, and if necessary migrates processes among the nodes to maximize the overall performance of the cluster. All this is done without changing the Linux interface. This means that you continue to see (and control) all your processes as if they are running on your login node.

The algorithms of MOSIX are decentralized - each node is both a master for processes that were created locally, and a server for (remote) processes, that migrated from other nodes. This means that nodes can be added or removed from the cluster at any time, with minimal disturbances to running processes. Another useful property of MOSIX is its monitoring algorithms that detect the speed of each node, its load and free memory, as well as the IPC and I/O rates of each process. This information is used to make near optimal process allocation decisions.

So far MOSIX has been developed seven times, for different versions of Unix and other architectures. It has been used as a production system for many years. The first PC version was developed for BSD/OS. The latest version is for Linux on Intel platforms.

MOSIX can support configurations with a large numbers of computers, with minimal scaling overheads to impair performance. A low-end configuration could include several PCs that are connected by Ethernet, while a larger configuration might include workstations and servers that are connected by higher speed LAN, e.g., Fast Ethernet. A high-end configuration may include a large number of SMP and non-SMP workstations and servers that are connected by a high performance LAN, e.g. Gigabit-Ethernet.

Summary

There are a plethora of choices available in the Linux clustering world. The first choice is deciding whether to call a commercial vendor like VA Linux archive.org, TurboCluster archive.org, or HP to build your cluster, or make your own, as we have discussed here. After all, it's free world!

Clusters can be used for general high performance computing, high-performance Web servers, flight simulators, virtual world applications, AI expert systems or even large MUD's. Virtually any application you can think of can benefit from a hardware cluster.

As a final note, there is a new kid on the block by the name OSCAR.OSCAR is a snapshot of the best-known methods for building, programming, and using clusters. It consists of a fully integrated and easy to install software bundle designed for high performance cluster computing. Everything needed to install, build, maintain, and use a modest Linux-based cluster is included in the suite, making it unnecessary to download or even install any individual software packages. OSCAR is the first project by the Open Cluster Group. For more information on the group and its projects, visit the website http://www.OpenClusterGroup.org archive.org.

Linux clusters are here to stay. Like Linux itself they are playing an important role in our daily life without our even being aware of them. For example google.com (my favorite search engine) runs a Linux cluster of around seventy thousand machines! That's one of the reasons why Google is so fast.