[Home] [Credit Search] [Category Browser] [Staff Roll Call] | The LINUX.COM Article Archive |
Originally Published: Monday, 12 March 2001 | Author: Marcelo Pham |
Published to: featured_articles/Featured Articles | Page: 1/1 - [Std View] |
Writing Database Oriented Web-based Applications with Perl - Part II
In part two of his continuing series, Marcelo walks us through the process of choosing and installing a database engine in Linux.
|
6. Databases for Linux
7. PostgreSQL
8. mySQL
9. Oracle
10. Which
one should you use?
11. Next issue
There are several databases for Linux. Right now the most used databases are PostgreSQL, mySQL and Oracle. Others not so often used but compatible are mSQL, Informix and DB2. In this installment we will talk a little about each one, how to get them and which one is best tailored to your needs, assuming that you have already designed your application and outlined your database.
Usually a database is installed in your Linux box and runs as a daemon process to service requests such as inquiries and updates. Each database has its own daemons and they work differently from one flavor to another. For example, Oracle uses a daemon to monitor the database (called Server Manager) and another one for each database (called listener), PostgreSQL only has one daemon for all its databases (called postmaster). These services are just regular processes running on the server in a specific port, like the httpd daemon. Usually you must shut them down with utilities that come with the database; if you kill the running processes or shut down the server without stopping the services, you may damage the database files.
A database can be running on the same server as the application, or on a dedicated box. If the application is complex and the database is often updated, it'd be a good idea to separate the database and the application server to improve performance. If the application is simple you can install the database in the same server. From your code you can connect to the database regardless its location.
When you separate the application and the database servers, that's called a two-tier architecture. Other architectures are called n-tier where n denotes the quantity of independent layers that your system contains.
Databases save data files in different ways. For example, PostgreSQL and mySQL save them in a directory with the database name. Oracle requires you to create dedicated partitions for parts the database: the data, the programs and the C libraries. There are also utilities (shell or command line tools) with which you can do administration tasks from the console window. You'll see them in detail in each database overview.
Most databases require special groups and users to be created during the installation, in order to do all the administrative tasks, so you don't have to do them as root, for security purposes. Normally you will create a specific account for the database -database superuser- and login with that account whenever you want to maintain the database.
Also, let me tell you that mySQL and PostgreSQL are open source databases (under GPL), while Oracle and miniSQL (or mSQL) are not. You can download a single-user non-commercial version of Oracle 8i at the Oracle Technology Network (OTN). You must create an account, but it's free, and you can download mSQL for free if you register in their website.
And, before proceeding, the following structure is utilized for each product:
i) A brief history of...: this section tells you a
little bit about how the database was born
ii) Where to get it: links to web and ftp sites where
you can download the full version (for databases released under
GPL) or the personal version (for commercial databases)
iii) How to install it: a summary of steps to follow
before and during installation. Remember, this does not replace
the installation instructions that come with the database, it's
just an overview to let you know how it will be and what you
will need.
iv) Starting the daemon: this part explains how to
start the different background processes to make the database
run. It also includes some routines to automate the
startup.
v) Creating a database: the basic syntax to create a
database
vi) Running the database shell: how to run the database
command prompt where you can type and execute commands to
maintain, update, inquiry and monitor the database and its
elements (tables, rules, sequences, etc.)
vii) The good and the bad side: good and bad things in
a few words.
viii) Comments: these are comments based on my own
experiences
Then, I finish this second part with a recommendation of which database you should use, why, and some recommended books. Enjoy-!
i) A brief history of PostgreSQL: PostgreSQL is a
powerful database engine that runs in almost all Unix and Linux
flavors.
In 1986, at University of Berkeley, California,
Michael Stonebreaker started developing an object
relational database server -Postgres- with a team of C
programmers. Then, Andrew Yu and Jolly Chen took
over the project and added SQL capabilities. Yu and Chen left
the University, but kept maintaining and enhancing Postgres
(helped by volunteers from different countries), and then
renamed it to PostgreSQL.
PostgreSQL is an open source software and has gained
lots of fans -like myself- from all over the world.
ii) Where to get it: You can download it for
free, from the PostgreSQL
website or from the official ftp site. The latest release is
7.0.3 and they're working on the relelase 7.1,
which promises to have a lot of enhancements. If you're getting
Postgres from a Linux distribution and it's older version than
7.x, I highly recommend you to upgrade it since it is much more
stable and reliable.
iii) How to install it: After downloading the
compressed file (it usually comes in tar/gz format) in a tmp
directory, you can follow basic directions specified here.
The tar/gz file comes with a file called INSTALL -that contains
detailed instructions.
iv) Starting the daemon: As I mentioned before, you must start the Postgres daemon, called postmaster. The program that launches the daemon is called pg_ctl and is located under the Postgres bin directory.
If you followed the installation instructions, you should now have a Postgres superuser (su) account. After logging in as the Postgres su you must export the environment variables so the postmaster will know where to look for programs, libraries and data. After that, you can start the postmaster with the pg_ctl program by typing:
$ cd /usr/local/pgsql/bin
$ ./pg_ctl -D /usr/local/pgsql/data start
In order to automate the exports and the postmaster startup, I'd suggest you to modify your .bash_profile script for the Postgres superuser.
The postmaster will start in the default port 5432. The -D option indicates where the database and configuration files reside. These files contain information about the default port, the current postmater pid, library locations, and more. Of course, start is the parameter to start the daemon. If you do a ps -ef, you will actually see two processes running: the pg_ctl and the postmaster.
For security reasons, you should never start the daemon as root.
This process will take care of all the database requests, even if you are connecting to different databases, but they all should be stored under /usr/local/pgsql/data.
If you ever need to shutdown the server or the database server, don't ever do a ps -ef and kill the running processes. Instead, from the Postgres superuser session, do
$ ./pg_ctl -D /usr/local/pgsql/data stop
This will safely shut down the services.
v) Creating a database: Let's create a test database
called 'testdb'. Log in as the Postgres superuser and type:
$ cd /usr/local/pgsql/bin
$ createdb testdb
CREATE DATABASE
then type
$ cd /usr/local/pgsql/data/base
$ ls -l
You will notice a new directory, called testdb. If you look under this directory, you will see all the files (those beginning with pg) that Postgres needs in order to manage the database you just created.
vi) Running the database shell: Now that you've
already created the database, let's create a table.
For database administration purposes, all databases come with
a shell, which can be called from the Linux console. From this
shell you can do all the administration and maintenance tasks.
Postgres comes with psql (very similar to the one Oracle
has).
Log in as Postgres superuser, and type
$ cd /usr/local/pgsql/bin
$ psql testdb
to connect to your database. The generic syntax would
be
$ psql <dbname>
When the db shell prompt is ready,
testdb=# create table contact (Code char(10) primary key,Last_Name char(50), First_Name char(50),Address char(100),Phone char(30));
(Don't forget the last semicolon to indicate that the command has ended. You can enter several lines -for easier reading- and execute the whole statement by typing \g. You can press the up arrow to recall old commands so you don't have to type them again, in version 7.0 and higher)
This will create the table contact. To continue with
the example from the Part I, create the rest of the tables with
the specified layouts.You can get help for the shell commands
by typing \h, or by inquiring the Postgres manual (you can
download it from the postgres
website documentation section)
Go to another session and login as root. Then,
# cd /usr/local/pgsql/data/base/testdb
# ls -l | more
You will notice some new files. These files are the tables that you just created from the shell. Postgres stores the tables as files under the directory with the database's name, along with all the indexes, relationships, stored procedures, etc. For example, the table files are named with <table_name>, and the indexes and primary keys with <table_name>_pkey.
vii) The good side and the bad side of Postgres:
-The Good side: includes production database features
(transactions, triggers, views, foreign keys, locking). Very
solid and has lots of user groups.
-The Bad side: no database replication, full text indexing,
maximum of 8 Kb per row (they're extending it to 32 Kb in the
7.1 version) and loses performance with many concurrent
connections.
viii) Comments: In my opinion, Postgres is the best open source relational database engine for developing solid web-based applications. It's fast, reliable, easy to configure, administer and maintain, and, as with any other open source software, it has worldwide support. It handles transactions, rules, triggers and even memory cursors.
I am currently developing all my web-based applications with
Postgres 7.0.3. I highly recommend it if you want to implement
an application that involves complex database operations such
as transactions and masive global updates.
i) A brief history of mySQL: There is not much
information about how mySQL was born. All I know is that
Michael 'Monty' Wideneus was leaded a project more than 10
years ago and he needed an application to interact with a
database. He tried miniSQL (mSQL) but after playing with it for
a while he noticed that it wasn't fast and flexible enough for
his needs. He decided to make a new SQL interface with the same
API interface mSQL uses, but with source code that could be
modified by anyone. Other collaborators enhanced and improved
on his work. It hasn't been discovered why there is a 'my'
before the SQL.
mySQL is an open source database and has been installed
on plenty of Linux web servers all over the world. Notice that
most web-hosting companies offer mySQL support. It's becoming a
standard in web-hosting services.
ii) Where to get it: You can download mySQL for
free, from the mySQL official
website. The latest stable version is 3.23. mySQL also
comes with some Linux distributions (Mandrake, RedHat, Stormix, SuSe and TurboLinux) but versions may
vary.
iii) How to install it: You have different formats to
choose from while browsing or downloading the installation
documentation in the documentation section
of the website. You should read these documents.
After the initial installation, don't forget to
apply the necessary patches (you can find them in the website,
along with the installation instructions).
Once the installation is done, you should run a utility to
configure the way mySQL will run in your box. Run the configure
utility:
$ ./configure
(Do a ./configure --help for syntax help). You will
be able to configure the port, the socket and the path to the
mySQL libraries and configuration files.
After that, you have to create grant tables that mySQL uses
internally, by executing the mysql_install_db
utility.
iv) Starting the daemon: mySQL has only one daemon that services all the requests. This daemon can be started in several ways. If you want to start it manually, you can do so by typing:
$ ./safe_mysqld --user=<user> &
or simply
$ ./mysqld &
(safe_mysqld will try to determine first the best options to run mysqld). This will start the server with the parameters specified by the configure utility. mySQL usually runs in port 3306, but you can change that if you want to. You can test if the server is running with the mysqladmin utility
$ ./mysqladmin version
$ ./mysqladmin variables
If you want to start/stop the server automatically, you can use the mysql.server script
$ ./mysql.server start
$ ./mysql.server stop
and add it conveniently in your Linux startup files, or place it in your database superuser ./bash_profile script. To shutdown the server manually, type:
$ ./mysqladmin -u root shutdown
All these programs are usually in the mySQL installation /bin directory.
v) Creating a database: To create a database, you must run the mysqladmin utility. This utility does not only create databases, it also does other administrative tasks:
$ ./mysqladmin create testdb
will create a database called testdb. To get help in mysqladmin commands
$ ./mysqladmin --help
mySQL comes with other many utilities to maintain the database, you can take a look at them in the manual, Chapter 14. They're all scripts under the /bin directory.
mySQL usually stores its database files under /usr/local/mysql/data. You'll see there the database and log files.
vi) Running the database shell: the mysql command line tool is called -of course- mysql. You can run it by typing
$ mysql <database name>
When the shell prompt is ready,
testdb=# create table contact (Code char(10) primary key,Last_Name char(50), First_Name char(50),Address char(100),Phone char(30))\g
('\g' denotes the end of the command. It will execute
the sentence that you just typed. You can press the up/down
arrow keys to recall old commands).
To get help for the shell commands, simply type -? or
--help.
vii) The good side and the bad side of mySQL:
-The Good side: excellent performance and reliability, very
popular, worldwide support and lots of user groups. Supports
database replication.
-The Bad side: no transactions, rules, views and subselects.
Poor table locking mechanisms.
viii) Comments: mySQL is a simple and fast database
engine, supports large databases, is very popular in
web-hosting companies and easy to maintain. It's also open
source and has a lot of user groups. However, it loses some
standard SQL features. For example, it does not have
transaction capabilities (commit/rollback), the table/record
locking is quite different (it uses global variables as flags.
As a hacker, I don't like the idea of global
variables much) and is not 100% with SQL ANSI standards.
I had to review and modify all of my stored procedures for
table creation, when I moved one of my applications from mySQL
to Oracle and PostgreSQL. It was a quick job, but still I had
to go script by script and check them line by line, the column
types, the index and primary key definition syntax, etc.
The transaction safe table handler is included in the mySQL
TODO list. There is a workaround if you install the Berkeley DB
(Berkeley DataBase, from
the BSD creators), which is an add-on to create BDB tables
within mySQL. I've never tried it, so I can't tell you how well
it works.
mySQL does have database replication, it maintains a secondary
backup database if the primary fails and the configuration is
simple.
From my point of view, if you want to develop an application
that does not involve multiple table updates or complex
transactions, mySQL is your best choice, since it's quite fast,
easy to configure, stable and popular in web-hosting companies
-you'll have no problems finding a web-hosting with mySQL.
However, if your application requires multiple or simultaneous
table operations, tricky reports that need temporary workspace
to do subqueries to get the final result, or interaction with
external or cross-platform databases, you should consider PostgreSQL or Oracle.
i) A brief history of Oracle: Back in 1977,
Larry Ellison and two partners founded Software Development
Laboratories, and decided to build a new relational
database engine, inspired by an IBM white paper. This project
was called Oracle (meaning a source of wisdom) and they
enhanced it with SQL capabilities. Soon they changed the
company name to what it is today, Oracle Corp.
Oracle is currently one of the best databases for both
business and Internet applications. It was originally designed
for Unix, DEC and IBM mainframes, and then extended to OS/2,
Linux and Windows NT environments. They've also developed
financial applications (Oracle Relational Financial
Applications) that are used by big corporations like Sony, and
also a complete development solution for e-business called
Oracle Internet Platform.
Oracle is not open source software, it is expensive and
mostly intended for big companies. It's extremely solid,
reliable, and incredibly stable on Unix based platforms.
ii) Where to get it: You can download a single
user-single server Oracle 8i for free, from the Oracle Technology Network
(OTN). You must register but the registration is free, they
just send you invitations to seminars and news about products
from time to time. You can also buy a CD that comes with an
Administration book and additional documentation, from MacMillan Software.
iii) How to install it: If you downloaded it from the
OTN, it should come with a HTML file containing installation
instructions. Otherwise you can browse the OTN website for
installation instructions. If you bought the CD, the document
is located in /mnt/cdrom/doc/lin.8xx/index.htm (xx is
the version, i/e mine is lin.815 for 8.1.5).
Oracle installation is very picky, read the installation
instructions carefully. I installed Oracle about 15 times when
I started playing with it!
Please remember that Oracle 8.x requires:
Also, Oracle needs at least two mount points in your box: one for the programs and libraries, and the other for the database files.
* Attention RedHat users! There is a script that you have to run during the installation called runIns.sh, located under the /install directory. For RedHat users, this script won't work, you must execute another version of runIns.sh script which is located under a different directory (it varies in every version of 8.x). Please refer to your installation documentation for further information, or go to the OTN website, but be aware of this compatibility issue.
This is only a brief description of how to install Oracle. The documentation is quite extensive so I'd suggest you to read it from the book or print it from the web prior to the installation.
iv) Starting the daemon: You must start two daemons
for Oracle. One of them is called the Server Manager, which
administers the main services; the other one is called listener
and satisfies all the database requests.
You should have created a Oracle superuser account. After
logging in as the Oracle su you must export the environment
variables so the Server Manager will know where to look
for programs, libraries and data.
Then, you can start the Server Manager by typing:
$ dbstart
(the server manager will start)
and you can start the 'listener' (or Net8 listener) by
typing:
$ lsnrctl start
(the listener will start)
The listener will start in the default port 1521. To see all the listener parameters
$ lsnrctl services and
$ lsnrctl status
To automate the server manager and listener startup, you can
include the statements in the .bash_profile
script for the Oracle superuser.
If you like to start the daemon processes from the root login,
you can add this to the .bash_profile script for the
root user
su - oracle8i -c "dbstart"
su - oracle8i -c "lsnrctl start"
(being oracle8i the Oracle superuser)
There are a lot of configuration files involved in the database warm and start up, it'd be tedious to list them all in this article. I can tell you the most important ones:
You should keep an eye on them.
To shutdown the server manager and the listener, from the
Oracle superuser session, type:
$ lsnrctl stop
(the listener will stop)
$ dbshut
(the server manager will stop)
This will safely shut down the services.
v) Creating the database: To create a database,
you will need to use the GUI assistant called dbassist. First,
logout your X-Window session and login again as the Oracle
superuser. Second, from your character based Oracle superuser
session, type:
$ dbassist &
and switch to the X-Window session (it may take a while to
start). You will see a GUI assistant that will guide you
through the database creation. It's quite intuitive, all you
have to do is follow the instructions.
Throughout the creation process the assistant will ask you for
the database type: OLTP (On Line Transaction Process), DSS
(Decision Support System) or Hybrid (takes advantage of both
worlds). I usually select Hybrid unless I'm making a very
specific application, so I recommend you to do the same. Then,
simply follow the rest of the instructions. It takes a while to
create the database depending on your hardware and free disk
space, so be patient !
vi) Running the database shell: The database
shell for Oracle is called Sqlplus.
Make sure that the server manager and listener are started,
log in as the Oracle superuser and type
$ sqlplus <username>/<password>@<service
name>
When the db shell prompt is ready,
testdb=# create table contact (Code char(10) primary key,Last_Name char(50), First_Name char(50),Address char(100),Phone char(30))
Sqlplus doesn't have a command history, so by pressing the arrow keys will only make garbage on the screen ...
For further information about sqlplus, please go to the OTN website.
Oracle usually stores the database files under a specific
directory with the database name. What Oracle recommends is two
mount points, like:
/u01 for Oracle programs and libraries
/u02 for Oracle database files
The database files would be located under:
/u02/oracle/oradata/<database name>: control files for
database
/u02/oradata/<database name>: actual database files
You will notice files with .dbf extension under both directories. The ones under /oradata are the tables you create, plus all the indexes, serializers, relationships, stored procedures, etc.
vii) The good side and the bad side of Oracle:
-The Good side: Excellent performance, reliability,
scalability, robustness in any kind of environment (Internet,
intranet, extranet). Other Oracle products enhance the database
server and make it a complete development environment. Plenty
of documentation and user groups. I think it's the best
database server out there.
-The Bad side: Commercial and expensive (although it's worth
the price). Very tricky to configure and maintain (you must
really know Oracle to administer it).
viii) Comments: Oracle is a powerful, reliable, excellent fault-tolerance database engine. eBay, Amazon and other high traffic websites use it for their database needs, that shows you how solid it is. It handles transactions, triggers, cursors, database replication and distributed databases. From my point of view, Oracle is the best database for Unix and Sun architectures out there. Unfortunately, it's quite expensive, is not open source and is very hard to configure, maintain and get free support. To be an Oracle dba you have courses that last a year or so and cost good money. I highly recommend this database engine only if you are working for a large company that can afford to buy it and hire an Oracle dba. If you're not, you should consider PostgreSQL.
10. Which database should you use?:
Well, it all depends on what you want to do. If you are developing a complex application (for example, an accounting module or a distribution logistic system) I would recommend you to choose Postgres, or, if the company you are working for can afford it, Oracle. Postgres is a serious RDBMS (Relational DataBase Management System) and has almost all Oracle features, but is open source and free, and is always open to modifications if you want to customize the engine.
The key for complex web-based applications are the transaction handler, the locking procedures and the triggers or rules. In a hostile high traffic environment like the Internet, connection problems, possible attacks and power outages are daily situations, and the database must be solid and robust enough to maintain integrity and avoid any kind of data corruption.
Postgres' performance may go down with a considerable number of concurrent connections, but it'll always keep the database safe.
Oracle, on the other hand, is so solid that the performance as well as the data integrity will remain the same no matter how many concurrent connections you have. On the down side, it is pretty tricky to configure (flexibility comes with complexity), expensive, and requires -for a serious installation- good hardware. But the wonderful thing is... if you develop an application for Postgres, you can easily migrate it to Oracle with no major modifications (the only problem would be transferring the data from Postgres to Oracle, but that's another chapter...).
If you want to implement a system that doesn't require
transactions, but must have quick database response with many
concurrent connections in small or large databases, mySQL is
your best call. It's open source so you can also customize it
to fit your needs, easy to install-configure-maintain, and
stable and fast in Internet environment with large databases.
It's also safe since it supports database replication. And, you
won't have any problem getting a web-hosting or dedicated
server with mySQL support.
Another choice, if you can is to mix both databases (mySQL
and Postgres). You can design an application that uses the two
databases
miniSQL is the original version of mySQL, I've never installed or used it, but it lost popularity and is not as enhanced as mySQL.
O'Reilly networks just released a new website called LAMP (Linux-Apache-Mysql-Perl|PHP|Python). It has everything related to the mentioned open source technologies, you can go there to find out more about these architectures.
To finish this second installment, I would suggest you to take a look at these interesting books to have a better idea when choosing your favorite database:
"Oracle 8: The Complete Reference" (George B.
Koch,Kevin Loney) - Osborne/McGraw-Hill
"Postgresql: Introduction and Concepts" (Bruce Momjian)
- Addison/Wesley
"Postgresql Programmer's Guide" (PostgreSQL Development
Team,Thomas Lochart) - iUniverse.com
"MySQL" (Paul DuBois) - MacMillan Publishing Co.
"MySQL and mSQL" (Randy Jay Yarger,Tim King,George
Reese,Andy Oram) - O'Reilly & Associates
Or, you can go to phpbuilder.com and see an excellent and exhaustive comparison between mySQL and Postgres, by Tim Perdue.
Summary:
-Choosing the database engine carefully for your application
is a must-do before you start programming
-PostgreSQL and Oracle are the most recommended databases
engines for complex applications. PostgreSQL is free under GPL;
Oracle is not
-mySQL is the best choice for fast, reliable web-based
applications and has a lot of related websites and user
groups.
The next issue will be 'Interfacing Perl with a Database (I)'. Part three of this series will go over DBD-DBI architectures, how to get and install them, and the code to connect, retrieve and save data to the database.
Marcelo "Taz" writes for the Linux.com Develop Section. He's an active member of the Florida Linux User Xchange (please visit www.flux.org), and, when not hacking, he likes to hang out with friends and travel with his soulmate Carina.