Originally Published: Monday, 13 August 2001 Author: Nile Geisinger and the BlueBox Team
Published to: develop_articles/Development Articles Page: 1/3 - [Printable]

An Introduction to Word Oriented Programming with BlueBox

Exciting new open source projects are growing every day. Here at Linux.com we're delighted to bring them to your attention, especially projects like BlueBox, a program that introduces the concept of a linkable programming unit. These structures, known as words, can be published on the Internet and linked together to create richer software. Words have the same potential for growth as the Web since anyone can extend software published on the Internet simply by creating new words that link to it. The BlueBox team wrote this weeks feature article just for Linux.com readers, so read on and get involved!

BlueBox   Page 1 of 3  >>

An Introduction to Word-Oriented Programming

Introduction

The Web made information discrete, linkable, and scalable. Discrete because it made it possible to distribute information a page at a time rather than as a bundle of pages bound into a book. Linkable because a reader could be transported from one Web page to another with a single click. Scalable because it made the value of a page proportional to the network of pages in which it resides.

BlueBox works by dynamically downloading the words it needs to understand documents. A introduction to BlueBox's features can be found here and an overview of its architecture can be found here.

Software today lacks each of these properties. Coded information is still pre-Web, mirroring how books and magazines were published before the appearance of dynamic content and hyperlinking. Software libraries consist of hundreds of objects bundled together that can not coherently link to one another and do not scale between non-communicating developers.

Words bring the same properties to coded information that Web pages brought to textual information. Words are linkable programming units that allow libraries to be released as dozens of discrete units. Words can be published on the Internet and link to other words to create richer software. They have the same potential for growth as the Web since software written in words can be extended by anyone by simply creating new words that link to it.

This paper is an introduction to word-oriented programming. It starts with a simple "Hello world!" program and moves from there to a complete chemistry language. It shows how to post words on the Web to create an Internet-defined language, how to extend software by linking to existing words, and how to inherit from words to create polymorphic languages. All of the examples, like BlueBox itself, are available under the GPL for readers to download and run.

Hello World in Words

Word-oriented programmers and object-oriented programmers approach problems from different perspectives. Whereas, in the object world, programmers create objects that make up systems, in the word world they create words that make up languages. The three cardinal properties of the Web are brought to software by replacing the concept of a modular thing (i.e., an object) with the concept of a modular piece of language (i.e., a word).

Despite their different approaches, words are supersets of objects. They have methods, data, and an additional structure for relating one word to other words. The power of words comes from this additional structure which forces programmers to couple logical and semantic relationships in a problem. This coupling creates a more powerful form of inheritance and polymorphism and makes software more flexible than the traditional object model.

Here's a simple Hello word that implements "Hello world:"

   <?xml version="1.0"?>
   <word:word xmlns:word="<a href=
"http://www.dloo.com">http://www.dloo.com</a>" >
   <!-- Specify the Match to match against-->
   <word:symbol name="Hello">
       <word:match matcher="absregex" expression="Hello"></word:match>
   </word:symbol>
   <word:definition>
       <word:method name="start" >
       <word:access></word:access>
       <word:code href="Python" >
       print "Hello World!"
        </word:code>
        <word:return>
           <word:variable></word:variable>
       </word:return>
       </word:method>

   </word:definition>

The matcher at the beginning of the Hello word defines the word's name and its symbol. Words are instantiated when their symbol appears in a document. The word interpreter works by parsing a document with the top level word and then passing the parsing job to the next word. In this way, symbol by symbol, the word interpreter reads a document instantiating words for each symbol it encounters. The symbol in "Hello world!" is "Hello," so this word will be instantiated when a user types "Hello" or "Hello" is encountered in a document.

Symbols are written in an extended form of regular expressions called abstract regular expressions. Abstract regular expressions are based on the regular expressions of Perl and Python that match groups of characters concisely:

  • [0-9] is a regular expression that matches any single digit between zero and nine (e.g., "3" and "4" are possible matches).
  • [0-9]+ is a regular expression that matches any natural number (e.g., "64534" and "5" are possible matches). The plus specifies that one or more matches is required.
  • [A-Z][a-z]{0,1} is a regular expression that matches a capital letter followed by an optional lower case letter. The curly brackets tell the parser to match the previous expression (in this case, any letter between 'a' and 'z') zero to one times.

Abstract regular expressions extend regular expressions by allowing words to be matched as well. Words can be matched by embedding the name of a word, not its symbol, in curly brackets, prefaced by a "#" symbol. For example:

  • {#Number} would match one occurrence of a number word.
  • {#Number}+animals would match one or more number words followed by the text string 'animals'.
  • {#Withdraw} {#Number} {#Currency} would match the withdraw word followed by a number word followed by the currency word. The whole expression would match "withdraw 400 dollars" or "withdraw 800 pesos" depending on how the withdraw, number, and currency words were defined.

When words are matched in an abstract regular expression, they are instantiated and stored as a list in the word they are matched in. Thus, if the transaction word matched "{#Withdraw}{#Number} {#Currency}" and the match succeeded, the transaction word would have a list containing instantiated withdraw, number, and currency words.

The second part of the "Hello" word is the definition of the word. The definition contains its methods and data and is analogous to the traditional class structure in object-oriented programming. In the "Hello" word's definition, there is only a single method that prints out "Hello World!" Notice how the method specifies what language it is written in. This is because it is possible to write words in words themselves. Some of the first words that are being defined are for traditional programming languages like Perl, Python, and C.





BlueBox   Page 1 of 3  >>