Originally Published: Monday, 4 September 2000 Author:
Published to: develop_articles/Development Articles Page: 1/1 - [Printable]

Unix Web Application Architectures - Part 1: Introduction and Basic Approaches

In this series I discuss various aspects of writing web applications in Unix. By a web application I mean a piece of software that is used with an ordinary browser, without client side Java or other major extensions. The reader is expected to have a basic understanding of the building blocks of web applications, such as HTML, JavaScript, HTTP and CGI. The focus is on problems that emerge when the application gets big enough, that the most simple-minded approaches become insufficient.

   Page 1 of 1  

Unix Web Application Architectures
1. Introduction and Basic Approaches
2. The Web Server
3. Sessions, Authentication, and Databases
4. Other Issues
5. An Alternate Approach

1 Preface

In this series I discuss various aspects of writing web applications in Unix. By a web application I mean a piece of software that is used with an ordinary browser, without client side Java or other major extensions. The reader is expected to have a basic understanding of the building blocks of web applications, such as HTML, JavaScript, HTTP and CGI. The focus is on problems that emerge when the application gets big enough, that the most simple-minded approaches become insufficient. Issues related to building web sites in general, such as maintaining static HTML pages, aren't considered.

The reader shouldn't expect this to be an unbiased or comprehensive discussion on the subject. Rather, this can be seen as an essay about the lessons I have learned when writing custom web applications for fun and for money. Because I use Linux as my platform, only the technologies available on Linux are considered. However, everything said should be applicable to other Unix variants. I mention Apache a lot of times throughout the text as an example of a HTTP server. This is simply because I use it, many others use it, and it's a fine example of a general purpose web server. I'll use terms "web server" and "HTTP server" pretty much interchangeably.

This document might be useful for someone who writes or is going to write a web application, and wants to get an overview of many possible approaches.

2 Introduction

2.1 An Example Application

One might call a hit counter a web application, but a program so simple is not of much interest from an architectural point of view. To give a picture of what features the kind of architecture this text discusses should support, let's assume one is building a web based order management application with the following requirements:

  • There are three user classes: admins, normal users and customers.
  • Customers use the system over the internet for entering orders for themselves.
  • Admins and normal users use the system from the local network to add, modify and remove customers, products and orders.
  • Admins can use all features, other users only a subset of them.
  • The entire system must be maintainable through a browser.
  • Requests must be authenticated and logged.
  • The application must be easy to learn to use and robust in face of user errors.
  • It is important that the application runs reliably and never corrupts data.
  • The application must be able to handle several requests per second, in parallel.

2.2 Common Characteristics of Web Applications

Web apps tend to share a number of common characteristics, independent of the application domain. Therefore it makes sense to try create a framework which handles these common characteristics as automatically, efficiently, and correctly as possible. When someone codes a framework that does these things, and makes a product of it, the product is often called an "application server." Indeed this document can be seen to describe what an application server does, and how to write one. The term used in this document shall be "application framework" however.

Below is a list of a number of features that many web apps have.

  • HTTP request reception. Often done by a separate web server.
  • Access control and authentication. Can be implemented by HTTP authentication, but other approaches can be better.
  • Session management.
  • HTTP request parameter validation.
  • A way to read and store data from/to a persistent store, usually a relational database.
  • Reasonably good latency and throughput.
  • Reasonable memory consumption in the event of several simultaneous requests.
  • Robustness against user and programmer errors.
  • Extendibility and connectivity to other systems.

How to achieve these and other features will be discussed in more detail in the rest of this document.

Clustering and related features like load balancing and fail management, while important for some applications, are not discussed, for the simple reason that I have no experience with them.

3 Basic Approaches

I consider the CGI interface the most rudimentary way of creating web apps. However, it's not necessary to start building things completely from scratch on top of raw CGI. In this chapter I will mention a number of packages or technologies which implement or make it easier to implement one or more of the features mentioned in previous chapter.

3.1 Application Servers

The word application server is used in this text to refer to products such as Allaire's Cold Fusion, IBM WebSphere or OpenSource product Zope. These aim to be more or less all-encompassing solutions that handle all aspects of application development. Some have high end features like fail-over clustering. Many come with easily reusable component library, or with entire prebuilt applications that can be customized. Usually these also address non-coding related issues such as web site management and explicit support for multiple developers and HTML writers. Most have a price tag of at least $1000, and often well over $10,000.

As mentioned above, this document is in a sense about how to write an application server. But why bother since products like this already exist? For smaller applications the price alone can be an obstacle, except with free products. Learning curve is another one: these are full development environments, and it takes weeks or months to learn to use them, more than that to learn using them well. If you already know well a programming language suitable for writing a web app, such as Perl, Python or Java, this makes a huge difference. Flexibility may be another important factor: having written the application framework yourself, you can fully customize any aspect of it.

All this said, third party web application servers can be an excellent choice for many purposes, and many of the largest sites are built using of them. Then again, many are not. In any case, these products are not further explored in this document.

3.2 Code Embedded in HTML

Probably the best known examples of this class of packages are the OpenSource product PHP, and Active Server Pages (ASP), the language used in Microsoft's Internet Information Server. In this approach, the code is put into the HTML files. Before the web server sends the files to the browser, the code in them is processed.

This approach works particularly well if the majority of the web site is static HTML. It's typically easy to start programming in these languages, and their structure is well-suited for writing web applications, since that's the whole purpose of the environment. They come with a comprehensive function library for performing common tasks needed in building web sites and applications.

3.2.1 PHP

I personally have only experience with PHP in this application class, but I have been told ASP is essentially similar. I found PHP easy to learn (having background with languages including C++, Perl and shell), and intuitive to use for its intended purpose.

As an example of PHP, assume a HTML form (the example taken from http://www.zend.com/zend/art/intro.php):

  <FORM METHOD="GET" ACTION="submit.php">
  What's your name? <INPUT NAME="myname" SIZE=3>
  </FORM>

The file submit.php would then contain, for example:

  <HTML><BODY>

  <?php
  print "Hello, $myname!";
  ?>

  </BODY></HTML>

You probably get the idea from that. If not, the PHP web site gives more information. While I found PHP easy to use, it lacks a number of features that I think the perfect web application environment should have. These are listed below, and discussed in more detail later chapters. I'm assuming PHP is being used as Apache module mod_php, in which case it runs as part of the Apache process.

  • The fundamental characteristic of this group of products: no good separation of code and HTML. This is a convenient way to do things when the amount of code is small compared to the amount of HTML, but when the amount of code grows, and when a different groups of people edit the HTML and the code, this becomes awkward.
  • No support for validating request parameters. In web applications, HTTP requests can be considered function or method calls. The request parameters (stuff after question mark in a GET URL, or stdin with POST) are then analogous to function call arguments. I find it very convenient to be able to systematically check that a HTTP request has the correct set of parameters, that is, they exist, and their format is as expected, for instance some of them are numbers while others can be arbitrary strings. PHP doesn't support this.
  • Since PHP is an Apache module, each simultaneous HTTP request requires a separate Apache process. Therefore the memory consumption can become quite high with a large number of simultaneous requests. Note that this doesn't need to mean high CPU load: it may take the client a long time to receive the response, and the Apache process isn't freed until the client has received all of the response.
  • Session management support is based on cookies. Some third party libraries exist to allow more complex session handling.
  • A third party module library exists, and is of quite reasonable size.

3.3 mod_perl

The Apache module mod_perl allows running the perl interpreter as part of Apache processes. This way the perl bytecode is cached in the Apache processes, which speeds up execution a lot. Often mod_perl is simply used for speeding up perl CGI scripts. The "native" programming style with mod_perl is the same as with CGI: print statements scattered around the code. Mod_perl comes with a set of modules and extensions helpful for writing web apps, which makes it more than just a way to speed up perl CGI scripts.

Perl is an efficient and mature language, with a lot of users, books, and support services. There exists a very large number of third party modules for almost any purpose. These things make it a good general purpose language, and it's well suited for writing also web applications.

Perl was born as a language for Unix system administrator and text manipulation, and as such, it's not focused on web application development like PHP is. PHP is probably easier to learn for those not already familiar with either language, while mod_perl may be favored over PHP by those who already know perl, or want to use a language with maximal flexibility.

Copyright (c) 2000 by Samuli Kärkkäinen <skarkkai@woods.iki.fi>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).





   Page 1 of 1