Originally Published: Monday, 5 March 2001 Author: Marcelo Pham
Published to: featured_articles/Featured Articles Page: 1/1 - [Std View]

Writing Database Oriented Web Based Applications With Perl - Part I

This week, Marcelo Pham shows Linux.com readers how to write database-driven web applications with Perl.

Index:

  1. Introduction
  2. Why Perl?
  3. System design basics
  4. Database design
  5. Next release

1. Introduction

What is a web based application? A web based application is simply an application that runs over the Internet. The programs run on the web server and interact with the user's browser, without the need of installing them on the client side. Yes, we're going back to the old thin-client architecture that mainframes used to use (IBM 34/36, RPG, wow, that's really old!). Imagine the power of this: a program that is always available, no matter where you are, no matter when you execute it, it's stays there, ready to use.

This is possible thanks to the CGI (Common Gateway Interface), a protocol that allows scripts to run in the server side sending the result via the web in a simple HTML output.

Now, a simple web based application has a little logic and some interaction with the user. We won't be talking about counters or mail forms, we want to go further than that, we want our application to interact with databases and run effectively in a heavy traffic multiuser environment as the web is. This will be the main subject of this series of articles.

This first installment is just an overview and a couple of things to have in mind before proceeding. It's mainly intended to educate developers to design and document in a standard way any application they want to implement. Some things may sound obvious, theoretic and even silly, but it's worth to mention them to do the right thing from the start and, in the long run, you'll see that they will save you a lot of time. After this introduction we will touch more specific and technical issues.

We'll talk about the development of database oriented web based applications under Linux using Perl as the programming language. We'll go over several Linux databases and you'll choose the one that you like. Basically the following topics will be covered:

(go to top)

2. Why Perl?

There are several programming languages for the CGI under Linux: PHP, Python, C, and, of course, Perl. Perl (Practical Extraction and Report Language) was born in 1986 and was invented originally for queries and reports, by Larry Wall. Then, a lot of programmers added and enhanced to made it what it is today. Perl is one of the best choice because it's simple enough to learn it quickly, and powerful enough to make virtually any kind of program or application, from simple counters to sophisticated utilities. It's fast, reliable, a versatile text manipulation tool and has worldwide support and a vast number of newsgroups and related web sites. It's also an open source software, and completely portable -you can migrate it from Linux to any other Linux or Unix flavor and Windows. Perl has become the standard language for Internet development and has gained thousands of adepts in the last years, so, for any small project you start with Perl, you will be fully supported, and for big projects, you can easily join and work smoothly with other Perl programmers. You might read or hear a bunch of critics about Perl, but the truth is, every day more and more web developers choose Perl and I -as well as thousands of programmers- will not give it up.

Most Linux distributions come with Perl. Perl is an interpreter, you will need it installed in your box in order to execute your scripts. Perl is usually installed under /usr/bin/perl or /usr/local/bin/perl.

Perl scripts are plain text files that begin with a special line which indicates the path to the interpreter, called shebang (#!/usr/bin/perl). Perl scripts must have permission to be executed, you can do it through chmod 0755 <script_name>.

A common question is, can you protect a Perl program? Technically, yes. There is a way to compile a Perl script and make it executable and unreadable, but the performance will go down drastically and, besides, you would be defeating the whole Open Source philosophy, don't you think?

For more information, tutorials and documentation about Perl, you can go to www.perl.com and www.perl.org. For documentation only, www.perldoc.com

As of today the stable version is 5.6.0. A company called Activestate is also offering a free Perl distribution. There is an archive for Perl documentation, routines and libraries called CPAN (Comprehensive Perl Archive Network). You can go there and download any kind of file, from tutorials to modules for all purposes.

Recommended books?

"Teach yourself CGI" (Rafe Colburn) Sams. This book goes over programming the CGI mainly with Perl (it includes some PHP routines), database connectivity (DBD-DBI) and everything related to the web. It's quite good.

"Programming Perl (3rd. Edition)" (Larry Wall, Tom Christiansen and Jon Orwant) O'Reilly. This one talks strictly about Perl: the language itself, syntax, modules, etc. by its creator Larry Wall. Pretty good too.

"CGI/Perl Cookbook" (Craig Patchett and Matthew Wright). I've heard about this book but I haven't had a chance to take a look at it. If you do, let us know !!

(go to top)

3. System design basics

First of all, we have to design our application. A good design comes after a good analysis of what has to be accomplished, how, when. Design means think of every little detail before going to the keyboard and hack like crazy...

Ok, I -we- used to do that before, like "why do I have to write everything down, if I have everything in my mind?". Let me explain this: if you describe in a piece of paper (any piece) what your application will do and how it will do it, you will have a better idea and an overview. Having that on paper will save you things to remember later on, and will give you a better perspectives of what you're missing and what can be improved. It's like the difference between seeing a tree from within the wood, and seeing it from above the wood.

But, it's not a matter of getting a piece of paper and draw whatever comes to your mind. There are standards for system design. We won't touch them in detail, in fact I'll be more than happy if you plan in paper everything you will do, and fully document everything you made, but, if you will be working with a group of designers and programmers, it'll be good to at least learn about the common and known methodologies.

Basically you must model your application abstracting how the system will behave and react based on what you did in the analysis stage, following certain rules and procedures. Depending on the "style" or "school" you choose, your model will vary. Structured design is quite old but it's still used and it was the first utilized methodology. It consists in representing your system behavior with flow (for actions) and entity (for data) diagrams. Structured design standards were developed by several authors like Tom De Marco, Shraer and Mellor, and the famous Edward Yourdon.

You can also use object oriented (oo) design techniques, which organizes your system elements in objects, attributes, events, classes and members. There are several "flavors" such as UML (Unified Model Language) and the ones developed by Jacobson and Booch. In future releases, if the time and space allow it, we will explain the UML in detail.

A brief overview and recommended books for the different design methods can be found here. If you have some time, go to a library and take a look at them, they are quite interesting.

Another point to have in mind when designing is the environment you will work in, the architecture and the different issues you will be dealing with. An excellent article about this matter called "Unix Web Application Architectures" can be found at the l.c. development article archive.

--To finish my attempt to convince you about how important the design stage is, let me tell you about the software life cycle. These are the stages in a software life

and these are what we call the "system pyramids".
"The good"


          /\
         /  \ <-- Maintenance
        /----\
       /      \ <-- Testing
      /--------\
     /          \ <-- Implementation
    /------------\
   /              \ <-- Design
  /----------------\
 /                  \ <-- Analysis
/____________________\

and "the Bad"


\____________________/
 \                  / <-- Maintenance
  \----------------/
   \              / <-- Testing
    \------------/
     \          / <-- Implementation
      \--------/
       \      / <-- Design
        \----/
         \  / <-- Analysis
          \/

In other words: the more you invest analyzing and designing your application, the less implementation and maintenance you will have to do. On the other side, a poor analysis and design will end up in more time for implementation and, even worse, a high maintenance schema, that will make your system very unstable.

Throughout all these articles, we will work on one common example: a contact manager. This simple application will allow the user to enter, modify, delete and inquiry the name and phone number for customers, family and friends. The user will also be able to enter different dates for every contact and categorize them, for example birthdays, anniversaries, parties, etc.

(go to top)

4. Database design

Second, since we will be designing database oriented web based applications, we're missing... the database! The database design is fundamental. Again, a bad database design will end up in a nightmare when maintaining and enhancing your application. A relational database -like Oracle, Informix, DB2 - becomes more and more sensitive to changes as it grows, so it has to be almost fully created from the very beginning, therefore, a well planned database design is necessary. Ok, let me go over the basics when designing your relational database. After designing your application, you already know what information you'll need to store.

First, define the tables that you will need: Based on the example, we could have only one table, called Contact:

CONTACT:
Code char(10) primary key,
Last_Name char(50),
First_Name char(50),
Address char(100),
Phone char(30),
Birthday date,
Anniversary date,
Party date

This will work, but... what if, after doing everything, I want to add a new date, for example Wife's birthday? or I want to add 10 more dates? The correct way to do this would be

(*1*)
CONTACT:
Code char(10) primary key,
Last_Name char(50),
First_Name char(50),
Address char(100),
Phone char(30)

DATES:
Contact_Code char(10),
Date_Description char(100),
Occurs_in date,
primary key (Contact_Code,Date_Description)

And we would put what it is in the Date_Description (Birthday, Anniversary, etc.). Mmmmhh... not good enough, with this model we can't group them by dates in, for example, a report. The best model would be

CONTACT:
Code char(10) primary key,
Last_Name char(50),
First_Name char(50),
Address char(100),
Phone char(30)

DATES:
Contact_Code char(10),
Date_Code char(10),
Occurs_in date primary key (Contact_Code,Date_Code)

DATE_CODES:
Date_Code char(10) primary key,
Description char(100)

Storing in the DATE_CODES table the different events (Birthday, Anniversary, etc. as many as the user wants). This is a very simple design but it shows you how many issues can come up with just three tables.

Second, we should ensure referential integrity. This is, if an existing attribute of one table depends on a record stored in a second table, you shouldn't be able to delete that record. For example, if I have a date_code='BIRTHDAY' (in the DATE_CODES table) and one of the rows in the DATES table is referencing this record, we should restrict the deletion of that record in the table Date_Codes. This is:


Table DATES:

Contact_Code | Date_Code | Occurs_in
-------------+-----------+-------------
TAZ          | BIRTHDAY  | 01/24/1972


Table DATE_CODES:

Date_Code  | Description
-----------+--------------------------
BIRTHDAY   | Contact birthday

We shouldn't allow this > delete from DATE_CODES where date_code='BIRTHDAY'

We can achieve this by doing:

(*2*)
DATES:
Contact_Code char(10) references Contact,
Date_Code char(10) references Date_codes,
Occurs_in date,
primary key (Contact_Code,Date_Code)

Third, thinking about performance, we must define indexes to rapidly access the information we need. We could create indexes for every field on every table but that'll take a lot of hard drive space... so we should concentrate only on those fields that are used most in inquiries and reports. In this example, Code, First_Name and Last_Name from the Contact table would be the best fields to index.

Fourth, I know... you were waiting for me to say standards... yes, there are standards in database design. One universal standard is database normalization. Normalization is simply a process that ensures a correct database design to efficiently use storage space, eliminate redundant and/or inconsistent data, and facilitate the data maintenance. In order to achieve this, you have three forms of normalization: first normal form, second normal form and third normal form.

-The first normal form indicates that there can't be repeating groups. For example, what we applied in (*1*) was the first normal form.
-Then the second normal form says that no nonkey attributes or fields depend on a portion of the primary key.
-Finally the third normal form says that no attributes or fields depend on other nonkey attributes. Some people extended this concept to 5 normal forms.
Other standard procedures when designing database include data integrity (referential integrity -what we did in (*2*)-, entity integrity, domain integrity, etc.), indexing, ERD (Entity Relationship Diagrams), and more. For example, a ERD of our three tables would be like


+--------------+       +----------------+
| Contact      |       | Dates          |       +----------------+
+--------------+       +----------------+       | Date_Codes     |
| @Code        |>----o<| @Contact_Code  |       +----------------+
| Last_name    |       | @Date_Code     |>o---o<| @Date_Code     |
| First_name   |       | Occurs_in      |       | Description    |
| Address      |       +----------------+       +----------------+
| Phone        |
+--------------+

To see all these relational database design tools in detail, you can inquiry the following links for:

or you can refer to this book:
"Fundamentals of Database systems" (R. Elmasi Shamkant and B. Navathe) Benjamin-Cummings Publishing Company
(I had to read this book for one of my last exams back in the university... it's quite good, also very long -more than 800 pages!)

Let me say this again, I'm not pretending you to be an expert in database design, but at least you should know and learn some of the tools and techniques utilized when modeling relational databases.


Summary:
-Perl is one of the best tool to develop web based application, it's simple and powerful at the same time, easy to learn, and has worldwide support through news and user groups.
-A good analysis ensures a solid design. Combining system and database design guarantees effective implementation and low maintenance in the future.
-Database design is extremely important when using relational databases like Oracle, Informix, DB2, etc.


(go to top)

5. Next issue:

In the next issue we will describe the most common databases in the market, such as mySQL, mSQL, PostgreSQL and Oracle, which one fits best your needs, and how to get and install them in your Linux box.

Marcelo "Taz" Pham is a volunteer writer in the l.c.Development Section. He develops accounting and web based applications, lives in Miami-FL, loves motorcycles and cannot hack without a Greenday CD. To contact him: marcelo@linux.com.