Writing a Chemistry Program
With this basic introduction to words in hand, let's write a
simple chemistry library in words that checks equations of the form
"2H2 + O2 -> 2H20" and reports if they are balanced or not. In
chemistry, a balanced equation has to have the same number of atoms
on both sides of the equation. The finished program can then be
used as a sanity check for chemical processes that involve a large
number of reactions.

All of the examples in this tutorial are available under the
GPL and can be found here.
The first step in writing a word-oriented program is to identify
all of the elements in the problem domain and assign symbols to
them. The next is to use the matching structure in words to create
relationships between words that define how those symbols can be
grouped together and what those relationships mean.
This is similar to how we solve problems in everyday life. On
encountering a new set of data, we invent symbols to represent
entities and create relationships between those symbols to reflect
the relationships in the world. In the chemistry equation, "2H2 +
O2 -> 2H2O," for example, the fact that two hydrogen molecules
are being added to an oxygen molecule to form two water molecules
is explicitly represented in the syntax. This explicit
representation of relationships makes it easier to understand and
maintain code.
There are seven basic entities in a chemical reaction: numbers,
subscripts, atoms, molecules, the production symbol, addition, and
the reaction itself. Let's define symbols for each of these
words:
- Numbers: In chemical reactions, the numbers proceeding a
molecule specify the number of molecules in that part of the
equation. 2H2O, for example, specifies that there are two H2O
molecules. From our earlier introduction to regular expressions, we
know that we can match numbers then with the symbol "[0-9]*".
- Subscripts: Subscripts specify the number of atoms of a certain
type that are in a molecule. In H20, for example, the subscript "2"
after H specifies that there are two Hydrogen atoms in water. Since
we are restricting ourselves to ASCII in this example, the
subscript symbol is identical to the Number symbol.
- Atoms: Atoms are the basic units of matter that make up
molecules, their most popular representation being found in the
Periodic chart. Atoms are expressed as either a single capital
letter (e.g., "H" as in Hydrogen) or a capital letter followed by a
lowercase letter (e.g., "Cl" as in Chlorine). The regular
expression to match one capital letter optionally followed by one
lowercase letter is "[A-Z][a-z]{0,1}."
- Molecules: Molecules are made out of sets of atoms. H20, NaCl,
CH4, are all molecules. In reaction equations, molecules come in
groups like 4CH4 and 2H2O. Our symbol for molecules must express
any number of Atom words grouped together. The abstract regular
expression to match any number of subscripted atoms is
"{#Number}({#Atom}{#Subscript})+"
- Production: The production symbol in chemistry equation
separates the product of a reaction from the result. Normally, it
is in the form of an arrow. We'll use a "dash" and a greater than
symbol and match the simple symbol "->".
- Addition: The addition symbol is used to represent the process
of adding two chemicals together. We'll use the traditional "+"
symbol to represent addition. Since the "+" symbol is a reserved
character in regular expressions, it needs to be escaped with a
backslash. The resulting symbol for addition is "\+".
- Reaction: The chemical reaction itself expresses the
transformation of one group of molecules into another group. With
our knowledge of abstract regular expressions, we can represent
this as "\s*{#Molecule}(\s*{#Addition}{#Molecule})*\s*{#Production}
\s*{#Molecule}(\s*{#Addition}{#Molecule})*"
With these definitions in hand, we are now ready to start
writing the program. Here is the number word.
<?xml version="1.0"?>
<word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a>" >
<!-- Specify the Match to match against-->
<word:symbol name="Number">
<word:match matcher="absregex" expression="[0-9*]"></word:match>
</word:symbol>
<word:includes>
<word:include>string</word:include>
</word:includes>
<word:definition>
<word:method name="finishedParsing" >
<word:access></word:access>
<word:code href="Python" >
self.count = 1
if (len(self.fMatch) != 0):
self.count = string.atoi(self.fMatch)
</word:code>
<word:return>
<word:variable></word:variable>
</word:return>
</word:method>
<word:method name="getNumber" >
<word:access></word:access>
<word:code href="Python" > </word:code>
<word:return>
<word:variable name="self.count"></word:variable>
</word:return>
</word:method>
</word:definition>
Note that it matches any numeric string and then converts the
string to a number. The subscript word is similar. Although they
match the same symbol in ASCII, Number and Subscript do not clash
because they are called in different contexts as can be seen in the
Molecule word below.
The atom word matches any character string like "S" or "Cl". It
has properties that represent how many electrons, neutrons, and
protons an atom has.
<?Xml version="1.0"?>
<word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
<!-- Specify the Match to match against-->
<word:symbol name="Atom">
<word:match matcher="absregex"
expression="[A-Z][a-z]{0,1}"></word:match>
</word:symbol>
<word:definition>
<word:method name="start" >
<word:access></word:access>
<word:code href="Python" >
self.numberOfElectrons = 0
self.numberOfPositrons = 0
self.numberOfNeutrons = 0
</word:code>
<word:return>
<word:variable></word:variable>
</word:return>
</word:method>
</word:definition>
Here's our definition for molecule. It matches any number of
atoms that are combined together like H2O or CH4.
<?Xml version="1.0"?>
<word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
<!-- Specify the Match to match against-->
<word:symbol name="Molecule">
<word:match matcher="absregex"
expression="{#Number}({#Atom}{#Subscript})+"/>
</word:symbol>
<word:definition>
</word:definition>
The Addition and Production words are very simple. They match "\+"
and "->" respectively .
<?Xml version="1.0"?>
<word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
<!-- Specify the Match to match against-->
<word:symbol name="Addition">
<word:match matcher="absregex" expression="\+"/>
</word:symbol>
<word:definition>
</word:definition>
<?Xml version="1.0"?>
<word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
<!-- Specify the Match to match against-->
<word:symbol name="Production">
<word:match matcher="absregex" expression="->"/>
</word:symbol>
<word:definition>
</word:definition>
Finally, here's a chemical reaction which can be of the form 2H2
+ O2 ->2H2O. The methods in the reaction, check to see if the
reaction is balanced.
<?xml version="1.0"?>
<word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
<!-- Specify the Match to match against-->
<word:symbol name="Reaction">
<word:match matcher="absregex"
expression="\s*{#Molecule}(\s*{#Addition}{#Molecule})*\s*{#Production}
\s*{#Molecule}(\s*{#Addition}{#Molecule})*"/>
</word:symbol><
<word:definition>
<word:method name="start" >
<word:access></word:access>
<word:code href="Python" >
</word:code>
self.leftSide = self.getFirstWord()
self.rightSide = None
self.balanced = 0
</word:code>
<word:return>
<word:variable name="self.balanced"></word:variable>
</word:return>
</word:method>
<word:method name="parsing" >
<word:access></word:access>
<word:code href="Python" >
currentWord = self.leftSide
while (currentWord != None):
if (currentWord.getName() == "Production"):
self.productionWord = currentWord
self.rightSide = currentWord.getNextWord()
currentWord = currentWord.getNextWord()
self.checkEquation()
</word:code>
<word:return>
<word:variable name="self.balanced"></word:variable>
</word:return>
</word:method>
<word:method name="checkEquation" >
<word:access></word:access>
<word:code href="Python" >
currentWord = self.leftSide
alreadyParsed = []
self.balanced = 1
while (currentWord != self.productionWord):
# For each molecule, iterate through
# it's atoms
if (currentWord.getName() == "Molecule"):
# For each atom, calculate the number
# of atoms on both sides.
currentPart = currentWord.getFirstWord()
while (currentPart != None):
if (currentPart.getName() == "Atom"):
# Get it's match and add it to already
# parsed.
atomString = currentPart.getMatch()
if not atomString in alreadyParsed:
alreadyParsed.append(atomStrin
g)
leftAtomCount =
self.countAtoms(atomString, self.leftSide, self.productionWord)
rightAtomCount =
self.countAtoms(atomString, self.rightSide, None)
if (leftAtomCount !=
rightAtomCount):
self.balanced = 0
currentPart = currentPart.getNextWord()
currentWord = currentWord.getNextWord()
</word:code>
<word:return>
<word:variable name="self.balanced"></word:variable>
</word:return>
</word:method>
<word:method name="countAtoms" >
<word:variables>
<word:variable name="atomString" type="string"/>
<word:variable name="currentWord" type="word"/>
<word:variable name="termination" type="word"/>
</word:variables>
<word:access></word:access>
<word:code href="Python" >
atomCount = 0
while (currentWord != termination):
# For each molecule, iterate through
# it's atoms
foundAtom = 0
if (currentWord.getName() == "Molecule"):
# For each atom, calculate the number
# of atoms on both sides.
subscriptNumber = 1
moleculeNumber = 1
currentPart = currentWord.getFirstWord()
while (currentPart != None):
if (currentPart.getName() ==
"Number"):
moleculeNumber =
currentPart.getNumber()
if (currentPart.getName() ==
"Atom"):
# Get it's match and
add it to already
# parsed.
currentAtomString =
currentPart.getMatch()
if (currentAtomString
== atomString):
foundAtom = 1
subscriptWord =
currentPart.getNextWord()
if (subscriptWord !=
None):
if
(subscriptWord.getName() == "Subscript"):
subscr
iptNumber = subscriptWord.getNumber()
currentPart =
currentPart.getNextWord()
if (foundAtom == 1):
atomCount += subscriptNumber * moleculeNumber
currentWord = currentWord.getNextWord()
</word:code>
<word:return>
<word:variable name="atomCount"></word:variable>
</word:return>
</word:method>
<word:method name="isBalanced" >
<word:access></word:access>
<word:code href="Python" >
</word:code>
<word:return>
<word:variable name="self.balanced"></word:variable>
</word:return>
</word:method>
</word:definition>
The reaction word in the document is instantiated as the
chemical reaction is matched. If "2H2 + O2 -> 2H20" is entered,
the program will run the document, matching all the words and
checking if the equation is balanced. If the equation is balanced,
the balanced attribute will be set in the reaction word.