Originally Published: Monday, 13 August 2001 Author: Nile Geisinger and the BlueBox Team
Published to: develop_articles/Development Articles Page: 1/1 - [Std View]

An Introduction to Word Oriented Programming with BlueBox

Exciting new open source projects are growing every day. Here at Linux.com we're delighted to bring them to your attention, especially projects like BlueBox, a program that introduces the concept of a linkable programming unit. These structures, known as words, can be published on the Internet and linked together to create richer software. Words have the same potential for growth as the Web since anyone can extend software published on the Internet simply by creating new words that link to it. The BlueBox team wrote this weeks feature article just for Linux.com readers, so read on and get involved!

An Introduction to Word-Oriented Programming

Introduction

The Web made information discrete, linkable, and scalable. Discrete because it made it possible to distribute information a page at a time rather than as a bundle of pages bound into a book. Linkable because a reader could be transported from one Web page to another with a single click. Scalable because it made the value of a page proportional to the network of pages in which it resides.

BlueBox works by dynamically downloading the words it needs to understand documents. A introduction to BlueBox's features can be found here and an overview of its architecture can be found here.

Software today lacks each of these properties. Coded information is still pre-Web, mirroring how books and magazines were published before the appearance of dynamic content and hyperlinking. Software libraries consist of hundreds of objects bundled together that can not coherently link to one another and do not scale between non-communicating developers.

Words bring the same properties to coded information that Web pages brought to textual information. Words are linkable programming units that allow libraries to be released as dozens of discrete units. Words can be published on the Internet and link to other words to create richer software. They have the same potential for growth as the Web since software written in words can be extended by anyone by simply creating new words that link to it.

This paper is an introduction to word-oriented programming. It starts with a simple "Hello world!" program and moves from there to a complete chemistry language. It shows how to post words on the Web to create an Internet-defined language, how to extend software by linking to existing words, and how to inherit from words to create polymorphic languages. All of the examples, like BlueBox itself, are available under the GPL for readers to download and run.

Hello World in Words

Word-oriented programmers and object-oriented programmers approach problems from different perspectives. Whereas, in the object world, programmers create objects that make up systems, in the word world they create words that make up languages. The three cardinal properties of the Web are brought to software by replacing the concept of a modular thing (i.e., an object) with the concept of a modular piece of language (i.e., a word).

Despite their different approaches, words are supersets of objects. They have methods, data, and an additional structure for relating one word to other words. The power of words comes from this additional structure which forces programmers to couple logical and semantic relationships in a problem. This coupling creates a more powerful form of inheritance and polymorphism and makes software more flexible than the traditional object model.

Here's a simple Hello word that implements "Hello world:"

   <?xml version="1.0"?>
   <word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Hello">
       <word:match matcher="absregex" expression="Hello"></word:match>
   </word:symbol>
   <word:definition>
       <word:method name="start" >
       <word:access></word:access>
       <word:code href="Python" >
       print "Hello World!"
        </word:code>
        <word:return>
           <word:variable></word:variable>
       </word:return>
       </word:method>

   </word:definition>

The matcher at the beginning of the Hello word defines the word's name and its symbol. Words are instantiated when their symbol appears in a document. The word interpreter works by parsing a document with the top level word and then passing the parsing job to the next word. In this way, symbol by symbol, the word interpreter reads a document instantiating words for each symbol it encounters. The symbol in "Hello world!" is "Hello," so this word will be instantiated when a user types "Hello" or "Hello" is encountered in a document.

Symbols are written in an extended form of regular expressions called abstract regular expressions. Abstract regular expressions are based on the regular expressions of Perl and Python that match groups of characters concisely:

Abstract regular expressions extend regular expressions by allowing words to be matched as well. Words can be matched by embedding the name of a word, not its symbol, in curly brackets, prefaced by a "#" symbol. For example:

When words are matched in an abstract regular expression, they are instantiated and stored as a list in the word they are matched in. Thus, if the transaction word matched "{#Withdraw}{#Number} {#Currency}" and the match succeeded, the transaction word would have a list containing instantiated withdraw, number, and currency words.

The second part of the "Hello" word is the definition of the word. The definition contains its methods and data and is analogous to the traditional class structure in object-oriented programming. In the "Hello" word's definition, there is only a single method that prints out "Hello World!" Notice how the method specifies what language it is written in. This is because it is possible to write words in words themselves. Some of the first words that are being defined are for traditional programming languages like Perl, Python, and C.

Writing a Chemistry Program

With this basic introduction to words in hand, let's write a simple chemistry library in words that checks equations of the form "2H2 + O2 -> 2H20" and reports if they are balanced or not. In chemistry, a balanced equation has to have the same number of atoms on both sides of the equation. The finished program can then be used as a sanity check for chemical processes that involve a large number of reactions.

All of the examples in this tutorial are available under the GPL and can be found here.

The first step in writing a word-oriented program is to identify all of the elements in the problem domain and assign symbols to them. The next is to use the matching structure in words to create relationships between words that define how those symbols can be grouped together and what those relationships mean.

This is similar to how we solve problems in everyday life. On encountering a new set of data, we invent symbols to represent entities and create relationships between those symbols to reflect the relationships in the world. In the chemistry equation, "2H2 + O2 -> 2H2O," for example, the fact that two hydrogen molecules are being added to an oxygen molecule to form two water molecules is explicitly represented in the syntax. This explicit representation of relationships makes it easier to understand and maintain code.

There are seven basic entities in a chemical reaction: numbers, subscripts, atoms, molecules, the production symbol, addition, and the reaction itself. Let's define symbols for each of these words:

With these definitions in hand, we are now ready to start writing the program. Here is the number word.

   <?xml version="1.0"?>
   <word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Number">
   <word:match matcher="absregex" expression="[0-9*]"></word:match>
   </word:symbol>
   <word:includes>

        <word:include>string</word:include>
   </word:includes>
   <word:definition>

        <word:method name="finishedParsing" >
       <word:access></word:access>
       <word:code href="Python" >
       self.count = 1
       if (len(self.fMatch) != 0):
         self.count = string.atoi(self.fMatch)
       </word:code>
       <word:return>
           <word:variable></word:variable>
       </word:return>
       </word:method>

      <word:method name="getNumber" >
      <word:access></word:access>
      <word:code href="Python" > </word:code>
      <word:return>
      <word:variable name="self.count"></word:variable>
      </word:return>
      </word:method>
</word:definition>

Note that it matches any numeric string and then converts the string to a number. The subscript word is similar. Although they match the same symbol in ASCII, Number and Subscript do not clash because they are called in different contexts as can be seen in the Molecule word below.

The atom word matches any character string like "S" or "Cl". It has properties that represent how many electrons, neutrons, and protons an atom has.

   <?Xml version="1.0"?>
   <word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Atom">
       <word:match matcher="absregex"
   expression="[A-Z][a-z]{0,1}"></word:match>
   </word:symbol>
   <word:definition>
     <word:method name="start" >
      <word:access></word:access>
      <word:code href="Python" >
      self.numberOfElectrons = 0
      self.numberOfPositrons = 0
      self.numberOfNeutrons = 0
      </word:code>
      <word:return>
      <word:variable></word:variable>
      </word:return>
      </word:method>
   </word:definition>

Here's our definition for molecule. It matches any number of atoms that are combined together like H2O or CH4.

   <?Xml version="1.0"?>
   <word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Molecule">
       <word:match matcher="absregex"
   expression="{#Number}({#Atom}{#Subscript})+"/>
   </word:symbol>
   <word:definition>
   </word:definition>

The Addition and Production words are very simple. They match "\+" and "->" respectively .

   <?Xml version="1.0"?>
   <word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Addition">

       <word:match matcher="absregex" expression="\+"/>
   </word:symbol>
   <word:definition>
   </word:definition>
   <?Xml version="1.0"?>
   <word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Production">

       <word:match matcher="absregex" expression="->"/>
   </word:symbol>
   <word:definition>
   </word:definition>

Finally, here's a chemical reaction which can be of the form 2H2 + O2 ->2H2O. The methods in the reaction, check to see if the reaction is balanced.

   <?xml version="1.0"?>
   <word:word xmlns:word="<A HREF="http://www.dloo.com">http://www.dloo.com</A>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Reaction">
       <word:match matcher="absregex"
   expression="\s*{#Molecule}(\s*{#Addition}{#Molecule})*\s*{#Production}
   \s*{#Molecule}(\s*{#Addition}{#Molecule})*"/>
   </word:symbol><
   <word:definition>
       <word:method name="start" >
           <word:access></word:access>
           <word:code href="Python" >
           </word:code>

           self.leftSide = self.getFirstWord()
           self.rightSide = None
           self.balanced = 0

          </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
        </word:method>

       <word:method name="parsing" >
           <word:access></word:access>
           <word:code href="Python" >
           currentWord = self.leftSide
           while (currentWord != None):
                    if (currentWord.getName() == "Production"):
                           self.productionWord = currentWord
                           self.rightSide = currentWord.getNextWord()
                   currentWord = currentWord.getNextWord()
           self.checkEquation()

           </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
        </word:method>

       <word:method name="checkEquation" >
           <word:access></word:access>
           <word:code href="Python" >
           currentWord = self.leftSide
           alreadyParsed = []
           self.balanced = 1
           while (currentWord != self.productionWord):
                   # For each molecule, iterate through
                   # it's atoms
                   if (currentWord.getName() == "Molecule"):
                           # For each atom, calculate the number
                           # of atoms on both sides.

                           currentPart = currentWord.getFirstWord()
                           while (currentPart != None):
                                   if (currentPart.getName() == "Atom"):
                                   # Get it's match and add it to already
                                   # parsed.
                                   atomString = currentPart.getMatch()
                                   if not atomString in alreadyParsed:
                                           alreadyParsed.append(atomStrin
   g)
                                           leftAtomCount                =
   self.countAtoms(atomString, self.leftSide, self.productionWord)
                                           rightAtomCount               =
   self.countAtoms(atomString, self.rightSide, None)
                                           if      (leftAtomCount      !=
   rightAtomCount):
                                                   self.balanced = 0
                           currentPart = currentPart.getNextWord()
                   currentWord = currentWord.getNextWord()
           </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
       </word:method>
       <word:method name="countAtoms" >
           <word:variables>
               <word:variable name="atomString" type="string"/>
               <word:variable name="currentWord" type="word"/>
               <word:variable name="termination" type="word"/>
           </word:variables>
           <word:access></word:access>
           <word:code href="Python" >
           atomCount = 0
           while (currentWord != termination):
                   # For each molecule, iterate through
                   # it's atoms
                   foundAtom = 0
                   if (currentWord.getName() == "Molecule"):
                           # For each atom, calculate the number
                           # of atoms on both sides.
                           subscriptNumber = 1
                           moleculeNumber = 1
                           currentPart = currentWord.getFirstWord()
                           while (currentPart != None):
                                   if      (currentPart.getName()      ==
   "Number"):
                                           moleculeNumber               =
   currentPart.getNumber()
                                           if  (currentPart.getName()  ==
   "Atom"):
                                                   #  Get  it's match and
   add it to already
                                                   # parsed.
                                                   currentAtomString    =
   currentPart.getMatch()
                                                   if  (currentAtomString
   == atomString):
                                                           foundAtom = 1
                                                   subscriptWord        =
   currentPart.getNextWord()
                                                   if  (subscriptWord  !=
   None):
                                                           if
   (subscriptWord.getName() == "Subscript"):
                                                                   subscr
   iptNumber = subscriptWord.getNumber()
                                   currentPart                          =
   currentPart.getNextWord()
                    if (foundAtom == 1):
                           atomCount += subscriptNumber * moleculeNumber
                   currentWord = currentWord.getNextWord()

           </word:code>
           <word:return>
               <word:variable name="atomCount"></word:variable>
           </word:return>
       </word:method>
       <word:method name="isBalanced" >
           <word:access></word:access>
           <word:code href="Python" >

           </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
       </word:method>
   </word:definition>

The reaction word in the document is instantiated as the chemical reaction is matched. If "2H2 + O2 -> 2H20" is entered, the program will run the document, matching all the words and checking if the equation is balanced. If the equation is balanced, the balanced attribute will be set in the reaction word.

Posting it on the Internet

We now have a small set of chemistry words. Each of these words is a discrete entity that can be linked to other words to form more powerful programs. Let's take advantage of that linkability now and post our sample chemistry language on the Internet. To do this, copy the reaction, atom, molecule, subscript, addition, production, and number words to a test directory on your webserver.

Once the words are on published on the Internet, documents written in those words can be posted on the Internet and BlueBox will understand them. To load the words into BlueBox, type:

./bluebox.py --load="<a href= "http://yourserver/atom.word">http://yourserver/atom.word</a>"
./bluebox.py --load="<a href= "http://yourserver/number.word">http://yourserver/number.word</a>"
./bluebox.py --load="<a href= "http://yourserver/subscript.word">http://yourserver/subscript.word</a>"
./bluebox.py --load="<a href= "http://yourserver/molecule.word">http://yourserver/molecule.word</a>"
./bluebox.py --load="<a href= "http://yourserver/reaction.word"http://yourserver/reaction.word</a>"
./bluebox.py --compile="<a href= "http://yourserver/atom.word">http://yourserver/atom.word</a>"
./bluebox.py --compile="<a href= "http://yourserver/number.word">http://yourserver/number.word</a>"
./bluebox.py --compile="<a href= "http://yourserver/subscript.word">http://yourserver/subscript.word</a>"
./bluebox.py --compile="<a href= "http://yourserver/molecule.word">http://yourserver/molecule.word</a>"
./bluebox.py --compile="<a href= "http://yourserver/reaction.word">http://yourserver/reaction.word</a>"

Bluebox can now run any balanced equation document. In the future, BlueBox will automatically download the words it needs to understand a document from the Web and eliminate the process of user interaction. A user, for example, will request a document from http://www.dloo.org/examples/balancedequation.txt and all of the words needed to run it will be downloaded with the document. In this way, users will browse through software with the same ease and transparency they browse the Web.

How to Add to the Language through Inheritance, Polymorphism, and Linking

As our language stands, it does not have any atoms. Never fear! Since it is on the Internet and words are polymorphic, a third party can extend the language by linking to it. A chemist, for example, that wanted to extend the language to include all of the atoms in the periodic table could do so without ever talking to the original creator of the Chemistry language by posting new words that link to the Atom word. The Molecule word would then polymorphically match words that inherit from Atom wherever it matched Atom.

The chemist could post Hydrogen:

   <?xml version="1.0"?>
   <word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Hydrogen">
        <word:match matcher="absregex" expression="H"></word:match>
   </word:symbol>
   <word:inherits>
       <word:inherit><a href=
"http://yoursite/Atom.word">http://yoursite/Atom.word</a> </word:inherit>
   </word:inherits>
   <word:definition>
     <word:method name="start" >
      <word:access></word:access>
      <word:code href="Python" >
      self.numberOfElectrons = 2
      self.numberOfPositrons = 2
      self.numberOfNeutrons = 2
      </word:code>
      <word:return>
      <word:variable></word:variable>
      </word:return>
      </word:method>
   </word:definition>

And Chlorine:

   <?xml version="1.0"?>
   <word:word xmlns:word="http://www.dloo.com" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Chlorine">
       <word:match matcher="absregex" expression="CL"></word:match>
   </word:symbol>
   <word:inherits>
       <word:inherit><a href=
"http://yoursite/Atom.word">http://yoursite/Atom.word</a> </word:inherit>
   </word:inherits>
   <word:definition>
     <word:method name="start" >
      <word:access></word:access>
      <word:code href="Python" >
      self.numberOfElectrons = 17
      self.numberOfPositrons = 17
      self.numberOfNeutrons = 18
      </word:code>
      <word:return>
      <word:variable></word:variable>
      </word:return>
      </word:method>
   </word:definition>

And so on ...

A third-party would then be able to download all of the original Chemistry words plus the new atoms that link to them and have a language that automatically balances equations with all of the new atoms. When the Molecule word is matched in the original language, it will polymorphically match all of the words that inherit from Atom wherever it matches Atom.

Word languages can also be extended by directly linking to existing words in abstract regular expressions. A chemistry high school teacher, for example, might want to use the language in a different way: to grade homework papers. To do so, he could create a Problem word and a Chemistry Test word. The Problem word would link to the Reaction word and the Chemistry Test word would contain any number of "Problem" words.

Here's the Problem word. It matches the problem number plus an equation.

   <?xml version="1.0"?>
   <word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Problem">
       <word:match matcher="absregex"
   expression="[0-9]*\)\s*{#Reaction}"/>
   </word:symbol>
   <word:definition>
       <word:method name="start" >
           <word:access></word:access>
           <word:code href="Python" >
           self.answer = 0
           </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
       </word:method>
       <word:method name="parsing" >
           <word:access></word:access>
           <word:code href="Python" >
           reaction = self.getFirstWord()
           self.answer = reaction.isBalanced()
           </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
       </word:method>
      <word:method name="getAnswer" >
           <word:access></word:access>
           <word:code href="Python" >
           </word:code>
           <word:return>
               <word:variable name="self.answer"></word:variable>
           </word:return>
       </word:method>
   </word:definition>

The ChemistryTest word matches any number of Problem words, sums up the correct answers, and prints the students score.

   <?Xml version="1.0"?>
   <word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a*gt;" >
   <!-- Specify the Match to match against-->
   <word:symbol name="ChemistryTest">
       <word:match matcher="absregex"
   expression="(\s*{#Problem}\s*\n*)*"/>
   </word:symbol>
   <word:definition>
       <word:method name="parsing" >
           <word:access></word:access>
           <word:code href="Python" >
           self.score = 0
           numberOfProblems = 0
           currentProblem = self.getFirstWord()
           while (currentProblem != None):
                   numberOfProblems += 1
                   self.score += currentProblem.getAnswer()
                   currentProblem = currentProblem.getNextWord()
           print      "Student      scored",     self.score,"out     of",
   numberOfProblems,"problems."
           </word:code>
           <word:return>
               <word:variable name="self.balanced"></word:variable>
           </word:return>
       </word:method>
   </word:definition>

Like the chemistry language itself, the chemistry test can be extended by third parties on the Internet. Other teachers could then inherit from the Problem word and create additional problems that students could be tested on in addition to balanced equations. The ChemistryTest word would then polymorphically match the new types of Problems that inherit from the Problem word to create a richer test with several types of problems.

By endowing coded information with the same properties that made the Web so successful, it is possible to create a Web of software with the same potential for growth. The atoms and chemistry examples illustrate how software written in words can become scalably richer through linking. Like Web pages, the value of the chemistry language is not in itself, but in the network of words it resides in.

Conclusion

Internet-defined languages can have a significant impact on the evolution of the Internet. The Internet is defined by shared standards. By changing the way that standards can be defined and and creating a technology where standards can be extended by anyone on the Internet, the speed at which Internet technologies evolve can be greatly accelerated.

In this picture of the Internet, standards are not implemented by committees, but dynamically defined by hundreds of people around the world, and then downloaded and assembled by a word compiler. To talk with another person or to understand a document that they have written, you don't have to write a specialized compiler, just dynamically download the discrete words that define the language being used. Since the core of the Internet consists of the standards (TCP/IP, SMTP, NFS, etc.) and its higher level languages (HTML, DHTML, Javascript, XML, RDF, etc.), this is a radically different vision of how the Internet can evolve.

Call for participation:

The open source community has always led in defining the languages and standards that make up the Internet. Word-oriented programming transforms the process by which the Internet is defined from being top-down by committee to bottom up by developers. Join us in bringing about this transformation. To participate, subscribe to the mailing list, learn more about word-oriented programming, and download the source.

More information on Bluebox, including the BlueBox source and examples, can be found at http://www.dloo.org.