Originally Published: Wednesday, 25 April 2001 Author: Brad Marshall
Published to: enhance_articles_sysadmin/Sysadmin Page: 1/1 - [Std View]

Linux Clustering - Advanced Concepts

Brad Marshall takes a look at the business of clustering, and what tools are available to reduce cluster failure.

In the previous article in this series, we covered Beowulf systems, clusters of workstations, and MOSIX. This article, on the other hand, will focus on the `business' side of clustering including virtual servers, and issues of high availability. Along the way we will also discuss some strategies to address single points of potential failure, and some technologies that help in that area.

Virtual Servers

Good examples of virtual servers are LVS (Linux Virtual Server), Squid in accelerated mode, Cisco LocalDirector, or any of the Apache front-ends to Java applications servers, such as mod_jk and Tomcat. These all work by having a front-end server that talks to multiple back-end servers.

While there is a single point of failure here (the front-end server), the load balancing inherent in the system allows that server to handle much higher levels of request than would be possible with a single server. Additionally, you can add and remove servers from the pool without affecting a client's ability to get service. This allows upgrades to happen piece-by-piece, servers that crash to be removed without downtime for the entire system, and responding to increased demands by simply adding more servers.

Linux Virtual Server

Linux Virtual Server as it currently stands, is basically a layer four switch. This means LVS routes packets between the client and servers without knowing the content of the packets it is routing. The ability for LVS to make decisions based on the content, which is called layer seven switching, would allow session management or services based on that content.

The front-end server, also known as the director, basically routes packets to the backend server, depending on how it is set up. Currently LVS has 3 methods of routing:

VS-NAT works by using Network Address Translation, or NAT. Most people have had experience with NAT in the form of IP masquerading. There are other forms of NAT however. Static NAT works by mapping addresses one to one to other addresses, while dynamic network address translation takes the address from a pool (the pool contains less addresses than you wish to translate). IP masquerading is just a special form of dynamic network address translation; it is a many to one translation.

The VS-NAT code is based on the IP masquerading code, as well as the port forwarding code, and works by matching the destination address and port of the incoming client request to a list of known virtual server services. The virtual servers rule table is then used to choose a real server, this mapping is recorded, and the destination address and port are rewritten, and forwarded on to their ultimate destination. When reply packets come back, the director rewrites the outgoing packets with the address of the virtual server, and forwards it on. Timeouts or connection terminations simply remove the mapping from the table.

VS-TUN works by using IPIP encapsulation, or tunneling. Tunneling is a way of encapsulating IP packets inside IP packets, which allows redirection of packets to another destination. It is very similar to VS-NAT where, instead of masquerading the packets to the real servers, it sends them to the real servers via an IP tunnel. This means the servers can be in physically distributed locations, but must support IP encapsulation.

In this method of routing, packets that are accepted for the virtual server will choose a real server, based on its scheduling algorithm, and log the connection into its records. It then encapsulates the packet and forwards it off to its destination server, which will de-capsulate the packet, respond to the request, and return this response to the client directly. Note that the real servers need to have a non-arp device - usually lo -configured with the IP address of the virtual server so when the packets arrive they can be treated as being destined for the real server.

VS-DR, or Virtual Server Direct Routing, works in a similar way to the others - it accepts packets for the virtual server, and then directly routes them to the selected server, which is chosen by some scheduling algorithm. Each of the servers have a non-arp interface, perhaps lo, that has the virtual server IP address. This allows the director to change the MAC address of the packet to that of the server and retransmit it on the LAN that contains the director and the servers. The server receives these rewritten packets, sees that they are destined for a local address and processes the request, returning the result to the user directly.

As mentioned previously, LVS has a few scheduling algorithms, namely round robin, weighted round robin, least connection and weighted least connection. There is much more to configure to properly set up LVS, and for more information on these and how to actually configure a server for LVS, see their website at www.linuxvirtualserver.org.

High Availability

On the other side of the clustering coin we have high availability. Where the virtual servers work by balancing the load over multiple servers, high availability works by having redundant servers ready to take over when the active server crashes. The aim is to reduce all single points of failure by either making parts of each computer redundant, or making the whole computer redundant. The level of redundancy you need depends on how much downtime costs you: There are many things you can do to increase reliability, but they all come at a cost.

There are several types of failover scheme in the existing HA market: idle standby, rotating standby, simple failover and mutual takeover. Depending on the vendor they might have different names, but the concepts are fairly similar.

Idle standby works by, as the name suggests, having an idle standby. This means there is one (or more) servers standing by doing nothing - it can be the same as the existing working server, or it can be of lesser spec., as long as it is sufficient to ensure operation at some level. Idle servers are given a priority, and the server with the highest priority takes over when the real server fails.

Rotating standby is similar to idle standby, but the idle servers are not given a priority - it is a simple FIFO (first in, first out) type of replacement strategy. In this case each server needs to be about equal in spec., as there is no ranking.

Simple failover is where the backup server runs some non-critical application but takes over the critical application when the primary server fails. The backup server does not have to be able to cope with both jobs - it is sufficient for it to just take over the work of the critical application(s) and drop the non-critical ones for the period of the downtime. When the real server comes back, it simply resumes the non-critical applications.

Mutual takeover is where two (or more) servers are configured such that they can take over the other's jobs, while still running their own applications. If the servers are not powerful enough to run both applications, the downgrade in performance must be acceptable while fixes are made.

Another simple form of clustering is round robin DNS. This is only applicable for simple connection services, and works by having the DNS entry for the service point to multiple hosts. The problem with this is all hosts are considered equal - there is no way to balance requests equally. In addition, some clients will cache the response to a DNS request, thus defeating the clustering. However, for all its faults, using round robin DNS is a cheap way of getting some form of clustering.

One important thing to remember is that clustering, regardless of what type, is only as good as its implementation. This means it will still require monitoring (and notification), it will still need backups, and it will still need regular maintenance. There are stories about clusters that had all the redundancy in the world, but still stopped working. This happens because the system slowly decays over time - hardware will always fail at some stage - and no one notices because there is no monitoring (or notification) of the system.

So as you have seen there are many types of clustering and many ways each type can be used. It is an important technology to understand, as it will let you provide services for load you never could have coped with before. However, it is not a silver bullet, and has many tricks and traps of its own you need to understand before you can fully utilize it.

Brad Marshall is a systems administrator at Plugged In Software, a small software development company in Brisbane, Australia. He spends his time hacking around on Debian, and writing articles for various places.