Linux.com Article DB: Unix Web Application Architectures - Part 2: The Web Server

Unix Web Application Architectures
1. Introduction and Basic Approaches
2. The Web Server
3. Sessions, Authentication, and Databases
4. Other Issues
5. An Alternate Approach

4 HTTP Server

One choice to make when deciding on the architecture of an application is whether to use an existing web server for HTTP request handling, or if it would be better to implement an HTTP server as part of the application.

The first HTTP protocol version, now known as HTTP version 0.9, was very simple one. There was just the GET method with no additional information, and the response was the document body as is. Not much was won by using a separate web server, instead of embedding one as part of one's application. But this was the situation 5 years ago.

The current HTTP protocol version is 1.1, and the RFC specifying it is over 400 KB of text. It includes 6 different methods and 17 different request headers. A good implementation should also work around bugs in client implementations. For example, some browsers don't work properly with keep-alive connections, and that feature shouldn't be used with those buggy browsers (as reported by the User-Agent field). For these reasons, if good HTTP protocol compliance matters, it's a good idea to use a separate HTTP server. An additional benefit of doing that is that often a part of the application features is best implemented by the web server, such as automatic directory listings, HTTP access control or redirections.

Reasons for not wanting to use a separate web server include:

Installation of the application should be as easy as possible.
Memory consumption should be minimized. This might be a serious consideration mainly in appliances and embedded environments.
The usage patterns are known to be such that a very simple HTTP implementation suffices. That is, clients won't be normal browsers, or only one browser version is used.

5 Performance

5.1 Static Content

Web servers are largely marketed by speed claims. It probably sounds very impressive to some people that a web server is able to serve so and so many thousand static files per second. In some extreme cases this can indeed make a difference, but those cases are rare. When serving static files, the performance tends to be always sufficient. In a quick test I performed, an Apache 1.3.11 on Linux 2.0.38 on a Pentium II 300 MHz, with a quite default configuration, served over 300 small files a second.

When serving static content the available bandwidth usually becomes the bottleneck long before the CPU. If each of those small files mentioned above is one kilobyte, then the transfer rate is about 300*1024*8 = 2.5 megabits per second. This is about two T1s. When serving 25-kilobyte pictures, the server was able to send them at total speed of about 4 megabytes per second, which is 32 megabits, or almost a T3. Even then only a third of the CPU was used.

Because the problems faced when trying to serve static content at very high speed have to do mainly with the choice and tuning of web server, operating system and hardware, the rest of this chapter will focus on dynamic content.

5.2 Dynamic Content

In most sites, the majority of time is spent executing dynamic code. However, before spending a lot of time and effort in speeding up your dynamic code, take a moment to consider how much latency and throughput you really want. I personally follow a rule of thumb that request latency for common operations should be kept under 200 ms. This kind of delay will get lost in other delays in fetching a page and rendering it on a browser. For more rarely used operations the latency can of course be far higher. In 200 ms you can do a lot with current CPUs.

5.2.1 Things to Not Do

What you can not do in under 200 ms is, for instance:

You can't start up a large Perl program. The compilation to bytecode, especially if you use large modules, is very slow. For example the the command "perl -e 'use Date::Manip'" takes 630 ms on the computer mentioned above.
You can't connect to some slow-to-connect databases. Connections should be persistent.
You can't make very heavy SQL requests, assuming you use SQL.

5.2.2 FastCGI

In the traditional CGI model the CGI program is an executable file in the filesystem, which starts, executes and then exits once per every HTTP request. Starting the CGI program can be slow, especially if it's written in a translated language which must first be precompiled, if it connects to a database or does some other slow initialization operation. Fundamentally, the benefit of having an always running daemon is that things - bytecode, DB connection, precompiled HTML template, etc. - can be cached in memory between requests.

FastCGI exists to solve this performance problem. In the FastCGI model, the CGI program is a continually running daemon to which connections are made over TCP/IP or other mechanism once for each HTTP request. FastCGI defines how the CGI parameters (environment and stdin) are passed from the web server to the CGI daemon over this connection, and how the reply is passed back to the web server. This also makes it possible to have the daemon running on a different host or set of hosts than the web server, maybe with a firewall in between to further improve security.

To make it possible to process multiple requests in parallel efficiently, there should probably be multiple daemon processes running continuously. When a new request arrives, one of idle processes is chosen to process it. If there are no idle processes, more may be started. This technique is called preforking. For example, Apache works this way.

The fastest possible FastCGI implementation runs as a module of the web server, so that executing the FastCGI code doesn't require executing an external program, or even a context switch. The performance is normally perfectly sufficient even if FastCGI is implemented as a CGI program written in C, because a computer such as one mentioned above is able to fork and execute a simple statically linked program 500 times a second.

5.2.3 Caching

Cache the output of dynamic code. If you think latency significantly below 200 ms is needed, you should probably consider aggressively caching the dynamic pages. This means that once you have generated a page, you keep the entire page in memory or in a quickly accessible file. When the next request for the same page arrives, and you can be sure you'd output an identical page, you instead return the cached page. This is especially useful if the data from which the dynamic pages are generated changes rarely, as is often the case.

Often it's best to divide pages into parts that can be separately cached. For instance, an application might cache the complex but rarely changing main part of some page, while the simple header that is different in each request is always created from scratch. When serving a request, the pieces are merged and sent to the browser. This idea can be generalized into any sub-results, such as commonly occurring database queries.

It may require a lot of thought and serious tradeoffs to end up with an application design that allows maximal caching of pages, but it can also be the best way to speed things up. Trying to add in caching once the application is ready is usually asking for trouble. Nothing is easier than getting the cache and real data out of sync with each other unless the caching layer is properly designed and the application architecture is "caching friendly."

A different approach to caching dynamic content is presented later, in chapter 7 (Squid as an HTTP Server Accelerator). A lot has been written about this subject. Try for instance this Google search.

5.2.4 Other Things

If you handle large amounts of data, plan your data storage system carefully. Sometimes custom data storage code is necessary, but more often it makes sense to use some off the shelf database. Using a database doesn't automatically make things fast, sometimes almost the opposite. Be familiar with your database, and do benchmarks.

Add more CPUs. This can be an easy although not free way to increase throughput, but not necessarily to improve latency. Usually there are not many simultaneous requests underway, so latency matters more. Of course, adding CPU's only helps if the application is able to process multiple requests in parallel.

Add more servers, and balance the load between them using multiple IP addresses for your web server DNS name, an IP level load balancer device or FastCGI (see above) with server daemons on multiple hosts. This has the potential of also increasing availability. Remember though that it may not be possible to distribute the database, which can thus become a bottleneck.

Have enough memory so that you don't need to do much disk I/O. If most of data isn't in buffers, things will slow down a great deal. This of course applies to any kind of application. If no amount of RAM is enough, you're probably dealing with very large databases, and you should get very familiar with your particular database server to optimize its performance.

5.2.5 Performance is Rarely a Big Problem

Despite all the talk about performance in relation to web sites, almost nobody writes dynamic web sites entirely in a maximally fast language such as C. Not even in very high traffic sites. This should be proof enough that performance isn't something you should excessively worry about most of time. Just don't make really stupid things, most common of which I listed above, and you should be okay for most applications.

6 Memory Consumption

Memory consumption of a web site depends on several factors: the web server in use, how big part of hits are for dynamic pages, the application architecture, hit rate and the wall clock time it takes to serve a hit.

6.1 Static Content

The number of requests being processed simultaneously is a product of rate and the time it takes to process one hit. Assume processing a hit for a static file takes 1 ms of CPU time, but 5 seconds of wall clock time (because the client receives the file slowly). If a site gets 100 such hits a second (which is rare), it has on average 500 requests being processed all time, but uses only 10% of its CPU time.

If the web server requires one process (as opposed to a thread) per each active request, the memory consumption gets very high with 500 simultaneous requests. Apache is an example of such server design, although Apache 2.0 is supposed to be able to run in a multithreaded mode, which may make it more suitable for this purpose.

Some other web servers create just one thread for each request, or don't use OS services at all and instead handle the multiplexing on their own. Especially the latter approach can lead into very small memory usage with very high number of simultaneously active requests. For this reason, web servers that have one process per request aren't a good choice in these situations.

6.2 Dynamic Content

When serving dynamic content, it is useful to separate the code generating the pages from the web server process. The idea is that the dynamic response is generated quickly by a separate process, and then buffered in the web server while it's being returned to the browser over the potentially slow network path. This way the process generating the dynamic content is freed quickly, and doesn't consume memory unnecessarily.

FastCGI has the potential to do this, if either the web server or the FastCGI implementation is able to cache the entire HTTP response, and thus free the daemon immediately after it is done with generating the response.

7 Squid as an HTTP Server Accelerator

Now, Squid is not the only alternative for this job, but it's the only one I'm familiar with, and is known to work well. HTTP server accelerating means that you have Squid sitting between the clients (usually the Internet) and your web server. Squid receives the HTTP request, and if the request is for a static or otherwise cacheable object, returns it immediately. Otherwise, Squid forwards the request to the actual web server that holds the original data and is capable of generating the dynamic content. Web server returns the response to Squid, which caches the response if possible, and then returns it to the client.

Since Squid, being a single threaded cache written specifically for this purpose (in addition to being a normal web proxy/cache), is very efficient both in CPU and memory usage, it is efficient to serve static data this way. Dynamic page serving speed isn't significantly degraded. With some applications, it might also be possible to implement some caching of dynamic data in this manner; for example Squid could be told to cache those dynamic pages for 15 seconds that don't need to be completely up to date, say a sports results page.

Copyright (c) 2000 by Samuli Kärkkäinen <skarkkai@woods.iki.fi>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).

Originally Published: Monday, 11 September 2000	Author: Samuli Kärkkäinen
Published to: develop_articles/Development Articles	Page: 1/1 - [Printable]
Unix Web Application Architectures - Part 2: The Web Server One choice to make when deciding on the architecture of an application is whether to use an existing web server for HTTP request handling, or if it would be better to implement an HTTP server as part of the application.

	Page 1 of 1
Unix Web Application Architectures 1. Introduction and Basic Approaches 2. The Web Server 3. Sessions, Authentication, and Databases 4. Other Issues 5. An Alternate Approach 4 HTTP Server One choice to make when deciding on the architecture of an application is whether to use an existing web server for HTTP request handling, or if it would be better to implement an HTTP server as part of the application. The first HTTP protocol version, now known as HTTP version 0.9, was very simple one. There was just the GET method with no additional information, and the response was the document body as is. Not much was won by using a separate web server, instead of embedding one as part of one's application. But this was the situation 5 years ago. The current HTTP protocol version is 1.1, and the RFC specifying it is over 400 KB of text. It includes 6 different methods and 17 different request headers. A good implementation should also work around bugs in client implementations. For example, some browsers don't work properly with keep-alive connections, and that feature shouldn't be used with those buggy browsers (as reported by the User-Agent field). For these reasons, if good HTTP protocol compliance matters, it's a good idea to use a separate HTTP server. An additional benefit of doing that is that often a part of the application features is best implemented by the web server, such as automatic directory listings, HTTP access control or redirections. Reasons for not wanting to use a separate web server include: Installation of the application should be as easy as possible. Memory consumption should be minimized. This might be a serious consideration mainly in appliances and embedded environments. The usage patterns are known to be such that a very simple HTTP implementation suffices. That is, clients won't be normal browsers, or only one browser version is used. 5 Performance 5.1 Static Content Web servers are largely marketed by speed claims. It probably sounds very impressive to some people that a web server is able to serve so and so many thousand static files per second. In some extreme cases this can indeed make a difference, but those cases are rare. When serving static files, the performance tends to be always sufficient. In a quick test I performed, an Apache 1.3.11 on Linux 2.0.38 on a Pentium II 300 MHz, with a quite default configuration, served over 300 small files a second. When serving static content the available bandwidth usually becomes the bottleneck long before the CPU. If each of those small files mentioned above is one kilobyte, then the transfer rate is about 30010248 = 2.5 megabits per second. This is about two T1s. When serving 25-kilobyte pictures, the server was able to send them at total speed of about 4 megabytes per second, which is 32 megabits, or almost a T3. Even then only a third of the CPU was used. Because the problems faced when trying to serve static content at very high speed have to do mainly with the choice and tuning of web server, operating system and hardware, the rest of this chapter will focus on dynamic content. 5.2 Dynamic Content In most sites, the majority of time is spent executing dynamic code. However, before spending a lot of time and effort in speeding up your dynamic code, take a moment to consider how much latency and throughput you really want. I personally follow a rule of thumb that request latency for common operations should be kept under 200 ms. This kind of delay will get lost in other delays in fetching a page and rendering it on a browser. For more rarely used operations the latency can of course be far higher. In 200 ms you can do a lot with current CPUs. 5.2.1 Things to Not Do What you can not do in under 200 ms is, for instance: You can't start up a large Perl program. The compilation to bytecode, especially if you use large modules, is very slow. For example the the command "perl -e 'use Date::Manip'" takes 630 ms on the computer mentioned above. You can't connect to some slow-to-connect databases. Connections should be persistent. You can't make very heavy SQL requests, assuming you use SQL. 5.2.2 FastCGI In the traditional CGI model the CGI program is an executable file in the filesystem, which starts, executes and then exits once per every HTTP request. Starting the CGI program can be slow, especially if it's written in a translated language which must first be precompiled, if it connects to a database or does some other slow initialization operation. Fundamentally, the benefit of having an always running daemon is that things - bytecode, DB connection, precompiled HTML template, etc. - can be cached in memory between requests. FastCGI exists to solve this performance problem. In the FastCGI model, the CGI program is a continually running daemon to which connections are made over TCP/IP or other mechanism once for each HTTP request. FastCGI defines how the CGI parameters (environment and stdin) are passed from the web server to the CGI daemon over this connection, and how the reply is passed back to the web server. This also makes it possible to have the daemon running on a different host or set of hosts than the web server, maybe with a firewall in between to further improve security. To make it possible to process multiple requests in parallel efficiently, there should probably be multiple daemon processes running continuously. When a new request arrives, one of idle processes is chosen to process it. If there are no idle processes, more may be started. This technique is called preforking. For example, Apache works this way. The fastest possible FastCGI implementation runs as a module of the web server, so that executing the FastCGI code doesn't require executing an external program, or even a context switch. The performance is normally perfectly sufficient even if FastCGI is implemented as a CGI program written in C, because a computer such as one mentioned above is able to fork and execute a simple statically linked program 500 times a second. 5.2.3 Caching Cache the output of dynamic code. If you think latency significantly below 200 ms is needed, you should probably consider aggressively caching the dynamic pages. This means that once you have generated a page, you keep the entire page in memory or in a quickly accessible file. When the next request for the same page arrives, and you can be sure you'd output an identical page, you instead return the cached page. This is especially useful if the data from which the dynamic pages are generated changes rarely, as is often the case. Often it's best to divide pages into parts that can be separately cached. For instance, an application might cache the complex but rarely changing main part of some page, while the simple header that is different in each request is always created from scratch. When serving a request, the pieces are merged and sent to the browser. This idea can be generalized into any sub-results, such as commonly occurring database queries. It may require a lot of thought and serious tradeoffs to end up with an application design that allows maximal caching of pages, but it can also be the best way to speed things up. Trying to add in caching once the application is ready is usually asking for trouble. Nothing is easier than getting the cache and real data out of sync with each other unless the caching layer is properly designed and the application architecture is "caching friendly." A different approach to caching dynamic content is presented later, in chapter 7 (Squid as an HTTP Server Accelerator). A lot has been written about this subject. Try for instance this Google search. 5.2.4 Other Things If you handle large amounts of data, plan your data storage system carefully. Sometimes custom data storage code is necessary, but more often it makes sense to use some off the shelf database. Using a database doesn't automatically make things fast, sometimes almost the opposite. Be familiar with your database, and do benchmarks. Add more CPUs. This can be an easy although not free way to increase throughput, but not necessarily to improve latency. Usually there are not many simultaneous requests underway, so latency matters more. Of course, adding CPU's only helps if the application is able to process multiple requests in parallel. Add more servers, and balance the load between them using multiple IP addresses for your web server DNS name, an IP level load balancer device or FastCGI (see above) with server daemons on multiple hosts. This has the potential of also increasing availability. Remember though that it may not be possible to distribute the database, which can thus become a bottleneck. Have enough memory so that you don't need to do much disk I/O. If most of data isn't in buffers, things will slow down a great deal. This of course applies to any kind of application. If no amount of RAM is enough, you're probably dealing with very large databases, and you should get very familiar with your particular database server to optimize its performance. 5.2.5 Performance is Rarely a Big Problem Despite all the talk about performance in relation to web sites, almost nobody writes dynamic web sites entirely in a maximally fast language such as C. Not even in very high traffic sites. This should be proof enough that performance isn't something you should excessively worry about most of time. Just don't make really stupid things, most common of which I listed above, and you should be okay for most applications. 6 Memory Consumption Memory consumption of a web site depends on several factors: the web server in use, how big part of hits are for dynamic pages, the application architecture, hit rate and the wall clock time it takes to serve a hit. 6.1 Static Content The number of requests being processed simultaneously is a product of rate and the time it takes to process one hit. Assume processing a hit for a static file takes 1 ms of CPU time, but 5 seconds of wall clock time (because the client receives the file slowly). If a site gets 100 such hits a second (which is rare), it has on average 500 requests being processed all time, but uses only 10% of its CPU time. If the web server requires one process (as opposed to a thread) per each active request, the memory consumption gets very high with 500 simultaneous requests. Apache is an example of such server design, although Apache 2.0 is supposed to be able to run in a multithreaded mode, which may make it more suitable for this purpose. Some other web servers create just one thread for each request, or don't use OS services at all and instead handle the multiplexing on their own. Especially the latter approach can lead into very small memory usage with very high number of simultaneously active requests. For this reason, web servers that have one process per request aren't a good choice in these situations. 6.2 Dynamic Content When serving dynamic content, it is useful to separate the code generating the pages from the web server process. The idea is that the dynamic response is generated quickly by a separate process, and then buffered in the web server while it's being returned to the browser over the potentially slow network path. This way the process generating the dynamic content is freed quickly, and doesn't consume memory unnecessarily. FastCGI has the potential to do this, if either the web server or the FastCGI implementation is able to cache the entire HTTP response, and thus free the daemon immediately after it is done with generating the response. 7 Squid as an HTTP Server Accelerator Now, Squid is not the only alternative for this job, but it's the only one I'm familiar with, and is known to work well. HTTP server accelerating means that you have Squid sitting between the clients (usually the Internet) and your web server. Squid receives the HTTP request, and if the request is for a static or otherwise cacheable object, returns it immediately. Otherwise, Squid forwards the request to the actual web server that holds the original data and is capable of generating the dynamic content. Web server returns the response to Squid, which caches the response if possible, and then returns it to the client. Since Squid, being a single threaded cache written specifically for this purpose (in addition to being a normal web proxy/cache), is very efficient both in CPU and memory usage, it is efficient to serve static data this way. Dynamic page serving speed isn't significantly degraded. With some applications, it might also be possible to implement some caching of dynamic data in this manner; for example Squid could be told to cache those dynamic pages for 15 seconds that don't need to be completely up to date, say a sports results page. Copyright (c) 2000 by Samuli Kärkkäinen <skarkkai@woods.iki.fi>. This material may be distributed only subject to the terms and conditions set forth in the Open Publication License, v1.0 or later (the latest version is presently available at http://www.opencontent.org/openpub/).
	Page 1 of 1